* [PATCH 1/6] perf offcpu: Fix a build failure on old kernels
2022-06-24 23:13 [PATCHSET 0/6] perf tools: A couple of fixes for perf record --off-cpu (v1) Namhyung Kim
@ 2022-06-24 23:13 ` Namhyung Kim
2022-06-28 14:44 ` Arnaldo Carvalho de Melo
2022-06-24 23:13 ` [PATCH 2/6] perf offcpu: Accept allowed sample types only Namhyung Kim
` (5 subsequent siblings)
6 siblings, 1 reply; 12+ messages in thread
From: Namhyung Kim @ 2022-06-24 23:13 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Jiri Olsa
Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, linux-perf-users,
Song Liu, Hao Luo, Milian Wolff, bpf, Blake Jones
Old kernels have task_struct which contains "state" field and newer
kernels have "__state". While the get_task_state() in the BPF code
handles that in some way, it assumed the current kernel has the new
definition and it caused a build error on old kernels.
We should not assume anything and access them carefully. Do not use
the task struct directly and access them using new and old definitions
in a row.
Reported-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/util/bpf_skel/off_cpu.bpf.c | 20 ++++++++++++++------
1 file changed, 14 insertions(+), 6 deletions(-)
diff --git a/tools/perf/util/bpf_skel/off_cpu.bpf.c b/tools/perf/util/bpf_skel/off_cpu.bpf.c
index 792ae2847080..cc6d7fd55118 100644
--- a/tools/perf/util/bpf_skel/off_cpu.bpf.c
+++ b/tools/perf/util/bpf_skel/off_cpu.bpf.c
@@ -71,6 +71,11 @@ struct {
__uint(max_entries, 1);
} cgroup_filter SEC(".maps");
+/* new kernel task_struct definition */
+struct task_struct___new {
+ long __state;
+} __attribute__((preserve_access_index));
+
/* old kernel task_struct definition */
struct task_struct___old {
long state;
@@ -93,14 +98,17 @@ const volatile bool uses_cgroup_v1 = false;
*/
static inline int get_task_state(struct task_struct *t)
{
- if (bpf_core_field_exists(t->__state))
- return BPF_CORE_READ(t, __state);
+ /* recast pointer to capture new type for compiler */
+ struct task_struct___new *t_new = (void *)t;
- /* recast pointer to capture task_struct___old type for compiler */
- struct task_struct___old *t_old = (void *)t;
+ if (bpf_core_field_exists(t_new->__state)) {
+ return BPF_CORE_READ(t_new, __state);
+ } else {
+ /* recast pointer to capture old type for compiler */
+ struct task_struct___old *t_old = (void *)t;
- /* now use old "state" name of the field */
- return BPF_CORE_READ(t_old, state);
+ return BPF_CORE_READ(t_old, state);
+ }
}
static inline __u64 get_cgroup_id(struct task_struct *t)
--
2.37.0.rc0.161.g10f37bed90-goog
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/6] perf offcpu: Fix a build failure on old kernels
2022-06-24 23:13 ` [PATCH 1/6] perf offcpu: Fix a build failure on old kernels Namhyung Kim
@ 2022-06-28 14:44 ` Arnaldo Carvalho de Melo
2022-06-28 16:52 ` Namhyung Kim
0 siblings, 1 reply; 12+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-06-28 14:44 UTC (permalink / raw)
To: Namhyung Kim
Cc: Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers,
linux-perf-users, Song Liu, Hao Luo, Milian Wolff, bpf,
Blake Jones
Em Fri, Jun 24, 2022 at 04:13:08PM -0700, Namhyung Kim escreveu:
> Old kernels have task_struct which contains "state" field and newer
> kernels have "__state". While the get_task_state() in the BPF code
> handles that in some way, it assumed the current kernel has the new
> definition and it caused a build error on old kernels.
>
> We should not assume anything and access them carefully. Do not use
> the task struct directly and access them using new and old definitions
> in a row.
I added a:
Fixes: edc41a1099c2d08c ("perf record: Enable off-cpu analysis with BPF")
Ok?
- Arnaldo
> Reported-by: Ian Rogers <irogers@google.com>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
> tools/perf/util/bpf_skel/off_cpu.bpf.c | 20 ++++++++++++++------
> 1 file changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/tools/perf/util/bpf_skel/off_cpu.bpf.c b/tools/perf/util/bpf_skel/off_cpu.bpf.c
> index 792ae2847080..cc6d7fd55118 100644
> --- a/tools/perf/util/bpf_skel/off_cpu.bpf.c
> +++ b/tools/perf/util/bpf_skel/off_cpu.bpf.c
> @@ -71,6 +71,11 @@ struct {
> __uint(max_entries, 1);
> } cgroup_filter SEC(".maps");
>
> +/* new kernel task_struct definition */
> +struct task_struct___new {
> + long __state;
> +} __attribute__((preserve_access_index));
> +
> /* old kernel task_struct definition */
> struct task_struct___old {
> long state;
> @@ -93,14 +98,17 @@ const volatile bool uses_cgroup_v1 = false;
> */
> static inline int get_task_state(struct task_struct *t)
> {
> - if (bpf_core_field_exists(t->__state))
> - return BPF_CORE_READ(t, __state);
> + /* recast pointer to capture new type for compiler */
> + struct task_struct___new *t_new = (void *)t;
>
> - /* recast pointer to capture task_struct___old type for compiler */
> - struct task_struct___old *t_old = (void *)t;
> + if (bpf_core_field_exists(t_new->__state)) {
> + return BPF_CORE_READ(t_new, __state);
> + } else {
> + /* recast pointer to capture old type for compiler */
> + struct task_struct___old *t_old = (void *)t;
>
> - /* now use old "state" name of the field */
> - return BPF_CORE_READ(t_old, state);
> + return BPF_CORE_READ(t_old, state);
> + }
> }
>
> static inline __u64 get_cgroup_id(struct task_struct *t)
> --
> 2.37.0.rc0.161.g10f37bed90-goog
--
- Arnaldo
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/6] perf offcpu: Fix a build failure on old kernels
2022-06-28 14:44 ` Arnaldo Carvalho de Melo
@ 2022-06-28 16:52 ` Namhyung Kim
0 siblings, 0 replies; 12+ messages in thread
From: Namhyung Kim @ 2022-06-28 16:52 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers,
linux-perf-users, Song Liu, Hao Luo, Milian Wolff, bpf,
Blake Jones
On Tue, Jun 28, 2022 at 7:44 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> Em Fri, Jun 24, 2022 at 04:13:08PM -0700, Namhyung Kim escreveu:
> > Old kernels have task_struct which contains "state" field and newer
> > kernels have "__state". While the get_task_state() in the BPF code
> > handles that in some way, it assumed the current kernel has the new
> > definition and it caused a build error on old kernels.
> >
> > We should not assume anything and access them carefully. Do not use
> > the task struct directly and access them using new and old definitions
> > in a row.
>
> I added a:
>
> Fixes: edc41a1099c2d08c ("perf record: Enable off-cpu analysis with BPF")
>
> Ok?
Sure, thanks for doing this!
Namhyung
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2/6] perf offcpu: Accept allowed sample types only
2022-06-24 23:13 [PATCHSET 0/6] perf tools: A couple of fixes for perf record --off-cpu (v1) Namhyung Kim
2022-06-24 23:13 ` [PATCH 1/6] perf offcpu: Fix a build failure on old kernels Namhyung Kim
@ 2022-06-24 23:13 ` Namhyung Kim
2022-06-28 14:46 ` Arnaldo Carvalho de Melo
2022-06-24 23:13 ` [PATCH 3/6] perf offcpu: Check process id for the given workload Namhyung Kim
` (4 subsequent siblings)
6 siblings, 1 reply; 12+ messages in thread
From: Namhyung Kim @ 2022-06-24 23:13 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Jiri Olsa
Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, linux-perf-users,
Song Liu, Hao Luo, Milian Wolff, bpf, Blake Jones
As offcpu-time event is synthesized at the end, it could not get the
all the sample info. Define OFFCPU_SAMPLE_TYPES for allowed ones and
mask out others in evsel__config() to prevent parse errors.
Because perf sample parsing assumes a specific ordering with the
sample types, setting unsupported one would make it fail to read
data like perf record -d/--data.
Fixes: edc41a1099c2 ("perf record: Enable off-cpu analysis with BPF")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/util/bpf_off_cpu.c | 7 ++++++-
tools/perf/util/evsel.c | 9 +++++++++
tools/perf/util/off_cpu.h | 9 +++++++++
3 files changed, 24 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c
index b73e84a02264..f289b7713598 100644
--- a/tools/perf/util/bpf_off_cpu.c
+++ b/tools/perf/util/bpf_off_cpu.c
@@ -265,6 +265,12 @@ int off_cpu_write(struct perf_session *session)
sample_type = evsel->core.attr.sample_type;
+ if (sample_type & ~OFFCPU_SAMPLE_TYPES) {
+ pr_err("not supported sample type: %llx\n",
+ (unsigned long long)sample_type);
+ return -1;
+ }
+
if (sample_type & (PERF_SAMPLE_ID | PERF_SAMPLE_IDENTIFIER)) {
if (evsel->core.id)
sid = evsel->core.id[0];
@@ -319,7 +325,6 @@ int off_cpu_write(struct perf_session *session)
}
if (sample_type & PERF_SAMPLE_CGROUP)
data.array[n++] = key.cgroup_id;
- /* TODO: handle more sample types */
size = n * sizeof(u64);
data.hdr.size = size;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index ce499c5da8d7..094b0a9c0bc0 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -48,6 +48,7 @@
#include "util.h"
#include "hashmap.h"
#include "pmu-hybrid.h"
+#include "off_cpu.h"
#include "../perf-sys.h"
#include "util/parse-branch-options.h"
#include <internal/xyarray.h>
@@ -1102,6 +1103,11 @@ static void evsel__set_default_freq_period(struct record_opts *opts,
}
}
+static bool evsel__is_offcpu_event(struct evsel *evsel)
+{
+ return evsel__is_bpf_output(evsel) && !strcmp(evsel->name, OFFCPU_EVENT);
+}
+
/*
* The enable_on_exec/disabled value strategy:
*
@@ -1366,6 +1372,9 @@ void evsel__config(struct evsel *evsel, struct record_opts *opts,
*/
if (evsel__is_dummy_event(evsel))
evsel__reset_sample_bit(evsel, BRANCH_STACK);
+
+ if (evsel__is_offcpu_event(evsel))
+ evsel->core.attr.sample_type &= OFFCPU_SAMPLE_TYPES;
}
int evsel__set_filter(struct evsel *evsel, const char *filter)
diff --git a/tools/perf/util/off_cpu.h b/tools/perf/util/off_cpu.h
index 548008f74d42..2dd67c60f211 100644
--- a/tools/perf/util/off_cpu.h
+++ b/tools/perf/util/off_cpu.h
@@ -1,6 +1,8 @@
#ifndef PERF_UTIL_OFF_CPU_H
#define PERF_UTIL_OFF_CPU_H
+#include <linux/perf_event.h>
+
struct evlist;
struct target;
struct perf_session;
@@ -8,6 +10,13 @@ struct record_opts;
#define OFFCPU_EVENT "offcpu-time"
+#define OFFCPU_SAMPLE_TYPES (PERF_SAMPLE_IDENTIFIER | PERF_SAMPLE_IP | \
+ PERF_SAMPLE_TID | PERF_SAMPLE_TIME | \
+ PERF_SAMPLE_ID | PERF_SAMPLE_CPU | \
+ PERF_SAMPLE_PERIOD | PERF_SAMPLE_CALLCHAIN | \
+ PERF_SAMPLE_CGROUP)
+
+
#ifdef HAVE_BPF_SKEL
int off_cpu_prepare(struct evlist *evlist, struct target *target,
struct record_opts *opts);
--
2.37.0.rc0.161.g10f37bed90-goog
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/6] perf offcpu: Accept allowed sample types only
2022-06-24 23:13 ` [PATCH 2/6] perf offcpu: Accept allowed sample types only Namhyung Kim
@ 2022-06-28 14:46 ` Arnaldo Carvalho de Melo
0 siblings, 0 replies; 12+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-06-28 14:46 UTC (permalink / raw)
To: Namhyung Kim
Cc: Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers,
linux-perf-users, Song Liu, Hao Luo, Milian Wolff, bpf,
Blake Jones
Em Fri, Jun 24, 2022 at 04:13:09PM -0700, Namhyung Kim escreveu:
> As offcpu-time event is synthesized at the end, it could not get the
> all the sample info. Define OFFCPU_SAMPLE_TYPES for allowed ones and
> mask out others in evsel__config() to prevent parse errors.
>
> Because perf sample parsing assumes a specific ordering with the
> sample types, setting unsupported one would make it fail to read
> data like perf record -d/--data.
> Fixes: edc41a1099c2 ("perf record: Enable off-cpu analysis with BPF")
Thanks, applied.
- Arnaldo
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
> tools/perf/util/bpf_off_cpu.c | 7 ++++++-
> tools/perf/util/evsel.c | 9 +++++++++
> tools/perf/util/off_cpu.h | 9 +++++++++
> 3 files changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c
> index b73e84a02264..f289b7713598 100644
> --- a/tools/perf/util/bpf_off_cpu.c
> +++ b/tools/perf/util/bpf_off_cpu.c
> @@ -265,6 +265,12 @@ int off_cpu_write(struct perf_session *session)
>
> sample_type = evsel->core.attr.sample_type;
>
> + if (sample_type & ~OFFCPU_SAMPLE_TYPES) {
> + pr_err("not supported sample type: %llx\n",
> + (unsigned long long)sample_type);
> + return -1;
> + }
> +
> if (sample_type & (PERF_SAMPLE_ID | PERF_SAMPLE_IDENTIFIER)) {
> if (evsel->core.id)
> sid = evsel->core.id[0];
> @@ -319,7 +325,6 @@ int off_cpu_write(struct perf_session *session)
> }
> if (sample_type & PERF_SAMPLE_CGROUP)
> data.array[n++] = key.cgroup_id;
> - /* TODO: handle more sample types */
>
> size = n * sizeof(u64);
> data.hdr.size = size;
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index ce499c5da8d7..094b0a9c0bc0 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -48,6 +48,7 @@
> #include "util.h"
> #include "hashmap.h"
> #include "pmu-hybrid.h"
> +#include "off_cpu.h"
> #include "../perf-sys.h"
> #include "util/parse-branch-options.h"
> #include <internal/xyarray.h>
> @@ -1102,6 +1103,11 @@ static void evsel__set_default_freq_period(struct record_opts *opts,
> }
> }
>
> +static bool evsel__is_offcpu_event(struct evsel *evsel)
> +{
> + return evsel__is_bpf_output(evsel) && !strcmp(evsel->name, OFFCPU_EVENT);
> +}
> +
> /*
> * The enable_on_exec/disabled value strategy:
> *
> @@ -1366,6 +1372,9 @@ void evsel__config(struct evsel *evsel, struct record_opts *opts,
> */
> if (evsel__is_dummy_event(evsel))
> evsel__reset_sample_bit(evsel, BRANCH_STACK);
> +
> + if (evsel__is_offcpu_event(evsel))
> + evsel->core.attr.sample_type &= OFFCPU_SAMPLE_TYPES;
> }
>
> int evsel__set_filter(struct evsel *evsel, const char *filter)
> diff --git a/tools/perf/util/off_cpu.h b/tools/perf/util/off_cpu.h
> index 548008f74d42..2dd67c60f211 100644
> --- a/tools/perf/util/off_cpu.h
> +++ b/tools/perf/util/off_cpu.h
> @@ -1,6 +1,8 @@
> #ifndef PERF_UTIL_OFF_CPU_H
> #define PERF_UTIL_OFF_CPU_H
>
> +#include <linux/perf_event.h>
> +
> struct evlist;
> struct target;
> struct perf_session;
> @@ -8,6 +10,13 @@ struct record_opts;
>
> #define OFFCPU_EVENT "offcpu-time"
>
> +#define OFFCPU_SAMPLE_TYPES (PERF_SAMPLE_IDENTIFIER | PERF_SAMPLE_IP | \
> + PERF_SAMPLE_TID | PERF_SAMPLE_TIME | \
> + PERF_SAMPLE_ID | PERF_SAMPLE_CPU | \
> + PERF_SAMPLE_PERIOD | PERF_SAMPLE_CALLCHAIN | \
> + PERF_SAMPLE_CGROUP)
> +
> +
> #ifdef HAVE_BPF_SKEL
> int off_cpu_prepare(struct evlist *evlist, struct target *target,
> struct record_opts *opts);
> --
> 2.37.0.rc0.161.g10f37bed90-goog
--
- Arnaldo
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 3/6] perf offcpu: Check process id for the given workload
2022-06-24 23:13 [PATCHSET 0/6] perf tools: A couple of fixes for perf record --off-cpu (v1) Namhyung Kim
2022-06-24 23:13 ` [PATCH 1/6] perf offcpu: Fix a build failure on old kernels Namhyung Kim
2022-06-24 23:13 ` [PATCH 2/6] perf offcpu: Accept allowed sample types only Namhyung Kim
@ 2022-06-24 23:13 ` Namhyung Kim
2022-06-24 23:13 ` [PATCH 4/6] perf offcpu: Parse process id separately Namhyung Kim
` (3 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Namhyung Kim @ 2022-06-24 23:13 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Jiri Olsa
Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, linux-perf-users,
Song Liu, Hao Luo, Milian Wolff, bpf, Blake Jones
Current task filter checks task->pid which is different for each
thread. But we want to profile all the threads in the process. So
let's compare process id (or thread-group id: tgid) instead.
Before:
$ sudo perf record --off-cpu -- perf bench sched messaging -t
$ sudo perf report --stat | grep -A1 offcpu
offcpu-time stats:
SAMPLE events: 2
After:
$ sudo perf record --off-cpu -- perf bench sched messaging -t
$ sudo perf report --stat | grep -A1 offcpu
offcpu-time stats:
SAMPLE events: 850
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/util/bpf_off_cpu.c | 1 +
tools/perf/util/bpf_skel/off_cpu.bpf.c | 8 +++++++-
2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c
index f289b7713598..7dbcb025da87 100644
--- a/tools/perf/util/bpf_off_cpu.c
+++ b/tools/perf/util/bpf_off_cpu.c
@@ -78,6 +78,7 @@ static void off_cpu_start(void *arg)
u8 val = 1;
skel->bss->has_task = 1;
+ skel->bss->uses_tgid = 1;
fd = bpf_map__fd(skel->maps.task_filter);
pid = perf_thread_map__pid(evlist->core.threads, 0);
bpf_map_update_elem(fd, &pid, &val, BPF_ANY);
diff --git a/tools/perf/util/bpf_skel/off_cpu.bpf.c b/tools/perf/util/bpf_skel/off_cpu.bpf.c
index cc6d7fd55118..143a8b7acf87 100644
--- a/tools/perf/util/bpf_skel/off_cpu.bpf.c
+++ b/tools/perf/util/bpf_skel/off_cpu.bpf.c
@@ -85,6 +85,7 @@ int enabled = 0;
int has_cpu = 0;
int has_task = 0;
int has_cgroup = 0;
+int uses_tgid = 0;
const volatile bool has_prev_state = false;
const volatile bool needs_cgroup = false;
@@ -144,7 +145,12 @@ static inline int can_record(struct task_struct *t, int state)
if (has_task) {
__u8 *ok;
- __u32 pid = t->pid;
+ __u32 pid;
+
+ if (uses_tgid)
+ pid = t->tgid;
+ else
+ pid = t->pid;
ok = bpf_map_lookup_elem(&task_filter, &pid);
if (!ok)
--
2.37.0.rc0.161.g10f37bed90-goog
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 4/6] perf offcpu: Parse process id separately
2022-06-24 23:13 [PATCHSET 0/6] perf tools: A couple of fixes for perf record --off-cpu (v1) Namhyung Kim
` (2 preceding siblings ...)
2022-06-24 23:13 ` [PATCH 3/6] perf offcpu: Check process id for the given workload Namhyung Kim
@ 2022-06-24 23:13 ` Namhyung Kim
2022-06-24 23:13 ` [PATCH 5/6] perf offcpu: Track child processes Namhyung Kim
` (2 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Namhyung Kim @ 2022-06-24 23:13 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Jiri Olsa
Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, linux-perf-users,
Song Liu, Hao Luo, Milian Wolff, bpf, Blake Jones
The current target code uses thread id for tracking tasks because
perf_events need to be opened for each task. But we can use tgid in
BPF maps and check it easily.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/util/bpf_off_cpu.c | 45 +++++++++++++++++++++++++++++++++--
1 file changed, 43 insertions(+), 2 deletions(-)
diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c
index 7dbcb025da87..f7ee0c7a53f0 100644
--- a/tools/perf/util/bpf_off_cpu.c
+++ b/tools/perf/util/bpf_off_cpu.c
@@ -11,6 +11,7 @@
#include "util/cpumap.h"
#include "util/thread_map.h"
#include "util/cgroup.h"
+#include "util/strlist.h"
#include <bpf/bpf.h>
#include "bpf_skel/off_cpu.skel.h"
@@ -125,6 +126,8 @@ int off_cpu_prepare(struct evlist *evlist, struct target *target,
{
int err, fd, i;
int ncpus = 1, ntasks = 1, ncgrps = 1;
+ struct strlist *pid_slist = NULL;
+ struct str_node *pos;
if (off_cpu_config(evlist) < 0) {
pr_err("Failed to config off-cpu BPF event\n");
@@ -143,7 +146,26 @@ int off_cpu_prepare(struct evlist *evlist, struct target *target,
bpf_map__set_max_entries(skel->maps.cpu_filter, ncpus);
}
- if (target__has_task(target)) {
+ if (target->pid) {
+ pid_slist = strlist__new(target->pid, NULL);
+ if (!pid_slist) {
+ pr_err("Failed to create a strlist for pid\n");
+ return -1;
+ }
+
+ ntasks = 0;
+ strlist__for_each_entry(pos, pid_slist) {
+ char *end_ptr;
+ int pid = strtol(pos->s, &end_ptr, 10);
+
+ if (pid == INT_MIN || pid == INT_MAX ||
+ (*end_ptr != '\0' && *end_ptr != ','))
+ continue;
+
+ ntasks++;
+ }
+ bpf_map__set_max_entries(skel->maps.task_filter, ntasks);
+ } else if (target__has_task(target)) {
ntasks = perf_thread_map__nr(evlist->core.threads);
bpf_map__set_max_entries(skel->maps.task_filter, ntasks);
}
@@ -185,7 +207,26 @@ int off_cpu_prepare(struct evlist *evlist, struct target *target,
}
}
- if (target__has_task(target)) {
+ if (target->pid) {
+ u8 val = 1;
+
+ skel->bss->has_task = 1;
+ skel->bss->uses_tgid = 1;
+ fd = bpf_map__fd(skel->maps.task_filter);
+
+ strlist__for_each_entry(pos, pid_slist) {
+ char *end_ptr;
+ u32 tgid;
+ int pid = strtol(pos->s, &end_ptr, 10);
+
+ if (pid == INT_MIN || pid == INT_MAX ||
+ (*end_ptr != '\0' && *end_ptr != ','))
+ continue;
+
+ tgid = pid;
+ bpf_map_update_elem(fd, &tgid, &val, BPF_ANY);
+ }
+ } else if (target__has_task(target)) {
u32 pid;
u8 val = 1;
--
2.37.0.rc0.161.g10f37bed90-goog
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 5/6] perf offcpu: Track child processes
2022-06-24 23:13 [PATCHSET 0/6] perf tools: A couple of fixes for perf record --off-cpu (v1) Namhyung Kim
` (3 preceding siblings ...)
2022-06-24 23:13 ` [PATCH 4/6] perf offcpu: Parse process id separately Namhyung Kim
@ 2022-06-24 23:13 ` Namhyung Kim
2022-06-24 23:13 ` [PATCH 6/6] perf offcpu: Update offcpu test for child process Namhyung Kim
2022-06-28 14:40 ` [PATCHSET 0/6] perf tools: A couple of fixes for perf record --off-cpu (v1) Arnaldo Carvalho de Melo
6 siblings, 0 replies; 12+ messages in thread
From: Namhyung Kim @ 2022-06-24 23:13 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Jiri Olsa
Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, linux-perf-users,
Song Liu, Hao Luo, Milian Wolff, bpf, Blake Jones
When -p option used or a workload is given, it needs to handle child
processes. The perf_event can inherit those task events
automatically. We can add a new BPF program in task_newtask
tracepoint to track child processes.
Before:
$ sudo perf record --off-cpu -- perf bench sched messaging
$ sudo perf report --stat | grep -A1 offcpu
offcpu-time stats:
SAMPLE events: 1
After:
$ sudo perf record -a --off-cpu -- perf bench sched messaging
$ sudo perf report --stat | grep -A1 offcpu
offcpu-time stats:
SAMPLE events: 856
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/util/bpf_off_cpu.c | 7 ++++++
tools/perf/util/bpf_skel/off_cpu.bpf.c | 30 ++++++++++++++++++++++++++
2 files changed, 37 insertions(+)
diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c
index f7ee0c7a53f0..c257813e674e 100644
--- a/tools/perf/util/bpf_off_cpu.c
+++ b/tools/perf/util/bpf_off_cpu.c
@@ -17,6 +17,7 @@
#include "bpf_skel/off_cpu.skel.h"
#define MAX_STACKS 32
+#define MAX_PROC 4096
/* we don't need actual timestamp, just want to put the samples at last */
#define OFF_CPU_TIMESTAMP (~0ull << 32)
@@ -164,10 +165,16 @@ int off_cpu_prepare(struct evlist *evlist, struct target *target,
ntasks++;
}
+
+ if (ntasks < MAX_PROC)
+ ntasks = MAX_PROC;
+
bpf_map__set_max_entries(skel->maps.task_filter, ntasks);
} else if (target__has_task(target)) {
ntasks = perf_thread_map__nr(evlist->core.threads);
bpf_map__set_max_entries(skel->maps.task_filter, ntasks);
+ } else if (target__none(target)) {
+ bpf_map__set_max_entries(skel->maps.task_filter, MAX_PROC);
}
if (evlist__first(evlist)->cgrp) {
diff --git a/tools/perf/util/bpf_skel/off_cpu.bpf.c b/tools/perf/util/bpf_skel/off_cpu.bpf.c
index 143a8b7acf87..c4ba2bcf179f 100644
--- a/tools/perf/util/bpf_skel/off_cpu.bpf.c
+++ b/tools/perf/util/bpf_skel/off_cpu.bpf.c
@@ -12,6 +12,9 @@
#define TASK_INTERRUPTIBLE 0x0001
#define TASK_UNINTERRUPTIBLE 0x0002
+/* create a new thread */
+#define CLONE_THREAD 0x10000
+
#define MAX_STACKS 32
#define MAX_ENTRIES 102400
@@ -220,6 +223,33 @@ static int off_cpu_stat(u64 *ctx, struct task_struct *prev,
return 0;
}
+SEC("tp_btf/task_newtask")
+int on_newtask(u64 *ctx)
+{
+ struct task_struct *task;
+ u64 clone_flags;
+ u32 pid;
+ u8 val = 1;
+
+ if (!uses_tgid)
+ return 0;
+
+ task = (struct task_struct *)bpf_get_current_task();
+
+ pid = BPF_CORE_READ(task, tgid);
+ if (!bpf_map_lookup_elem(&task_filter, &pid))
+ return 0;
+
+ task = (struct task_struct *)ctx[0];
+ clone_flags = ctx[1];
+
+ pid = task->tgid;
+ if (!(clone_flags & CLONE_THREAD))
+ bpf_map_update_elem(&task_filter, &pid, &val, BPF_NOEXIST);
+
+ return 0;
+}
+
SEC("tp_btf/sched_switch")
int on_switch(u64 *ctx)
{
--
2.37.0.rc0.161.g10f37bed90-goog
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 6/6] perf offcpu: Update offcpu test for child process
2022-06-24 23:13 [PATCHSET 0/6] perf tools: A couple of fixes for perf record --off-cpu (v1) Namhyung Kim
` (4 preceding siblings ...)
2022-06-24 23:13 ` [PATCH 5/6] perf offcpu: Track child processes Namhyung Kim
@ 2022-06-24 23:13 ` Namhyung Kim
2022-06-28 14:40 ` [PATCHSET 0/6] perf tools: A couple of fixes for perf record --off-cpu (v1) Arnaldo Carvalho de Melo
6 siblings, 0 replies; 12+ messages in thread
From: Namhyung Kim @ 2022-06-24 23:13 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Jiri Olsa
Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, linux-perf-users,
Song Liu, Hao Luo, Milian Wolff, bpf, Blake Jones
Record off-cpu data with perf bench sched messaging workload and count
the number of offcpu-time events. Also update the test script not to
run next tests if failed already and revise the error messages.
$ sudo ./perf test offcpu -v
88: perf record offcpu profiling tests :
--- start ---
test child forked, pid 344780
Checking off-cpu privilege
Basic off-cpu test
Basic off-cpu test [Success]
Child task off-cpu test
Child task off-cpu test [Success]
test child finished with 0
---- end ----
perf record offcpu profiling tests: Ok
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/tests/shell/record_offcpu.sh | 57 ++++++++++++++++++++++---
1 file changed, 50 insertions(+), 7 deletions(-)
diff --git a/tools/perf/tests/shell/record_offcpu.sh b/tools/perf/tests/shell/record_offcpu.sh
index 96e0739f7478..611b2d7c779b 100755
--- a/tools/perf/tests/shell/record_offcpu.sh
+++ b/tools/perf/tests/shell/record_offcpu.sh
@@ -19,20 +19,26 @@ trap_cleanup() {
}
trap trap_cleanup exit term int
-test_offcpu() {
- echo "Basic off-cpu test"
+test_offcpu_priv() {
+ echo "Checking off-cpu privilege"
+
if [ `id -u` != 0 ]
then
- echo "Basic off-cpu test [Skipped permission]"
+ echo "off-cpu test [Skipped permission]"
err=2
return
fi
- if perf record --off-cpu -o ${perfdata} --quiet true 2>&1 | grep BUILD_BPF_SKEL
+ if perf record --off-cpu -o /dev/null --quiet true 2>&1 | grep BUILD_BPF_SKEL
then
- echo "Basic off-cpu test [Skipped missing BPF support]"
+ echo "off-cpu test [Skipped missing BPF support]"
err=2
return
fi
+}
+
+test_offcpu_basic() {
+ echo "Basic off-cpu test"
+
if ! perf record --off-cpu -e dummy -o ${perfdata} sleep 1 2> /dev/null
then
echo "Basic off-cpu test [Failed record]"
@@ -41,7 +47,7 @@ test_offcpu() {
fi
if ! perf evlist -i ${perfdata} | grep -q "offcpu-time"
then
- echo "Basic off-cpu test [Failed record]"
+ echo "Basic off-cpu test [Failed no event]"
err=1
return
fi
@@ -54,7 +60,44 @@ test_offcpu() {
echo "Basic off-cpu test [Success]"
}
-test_offcpu
+test_offcpu_child() {
+ echo "Child task off-cpu test"
+
+ # perf bench sched messaging creates 400 processes
+ if ! perf record --off-cpu -e dummy -o ${perfdata} -- \
+ perf bench sched messaging -g 10 > /dev/null 2&>1
+ then
+ echo "Child task off-cpu test [Failed record]"
+ err=1
+ return
+ fi
+ if ! perf evlist -i ${perfdata} | grep -q "offcpu-time"
+ then
+ echo "Child task off-cpu test [Failed no event]"
+ err=1
+ return
+ fi
+ # each process waits for read and write, so it should be more than 800 events
+ if ! perf report -i ${perfdata} -s comm -q -n -t ';' | \
+ awk -F ";" '{ if (NF > 3 && int($3) < 800) exit 1; }'
+ then
+ echo "Child task off-cpu test [Failed invalid output]"
+ err=1
+ return
+ fi
+ echo "Child task off-cpu test [Success]"
+}
+
+
+test_offcpu_priv
+
+if [ $err = 0 ]; then
+ test_offcpu_basic
+fi
+
+if [ $err = 0 ]; then
+ test_offcpu_child
+fi
cleanup
exit $err
--
2.37.0.rc0.161.g10f37bed90-goog
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCHSET 0/6] perf tools: A couple of fixes for perf record --off-cpu (v1)
2022-06-24 23:13 [PATCHSET 0/6] perf tools: A couple of fixes for perf record --off-cpu (v1) Namhyung Kim
` (5 preceding siblings ...)
2022-06-24 23:13 ` [PATCH 6/6] perf offcpu: Update offcpu test for child process Namhyung Kim
@ 2022-06-28 14:40 ` Arnaldo Carvalho de Melo
2022-06-28 16:51 ` Namhyung Kim
6 siblings, 1 reply; 12+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-06-28 14:40 UTC (permalink / raw)
To: Namhyung Kim
Cc: Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers,
linux-perf-users, Song Liu, Hao Luo, Milian Wolff, bpf,
Blake Jones
Em Fri, Jun 24, 2022 at 04:13:07PM -0700, Namhyung Kim escreveu:
> Hello,
>
> The first patch fixes a build error on old kernels which has
> task_struct->state field that is renamed to __state. Actually I made
> a mistake when I wrote the code and assumed new kernel version.
>
> The second patch is to prevent invalid sample synthesize by
> disallowing unsupported sample types.
So I'll pick the first two for perf/urgent and then when that is merged
into perf/core pick the rest, ok?
- Arnaldo
> The rest of the series implements inheritance of offcpu events for the
> child processes. Unlike perf events, BPF cannot know which task it
> should track except for ones set in a BPF map at the beginning. Add
> another BPF program to the fork path and add the process id to the
> map if the parent is tracked.
>
> With this change, it can get the correct off-cpu events for child
> processes. I've tested it with perf bench sched messaging which
> creates a lot of processes.
>
> $ sudo perf record -e dummy --off-cpu -- perf bench sched messaging
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
> Total time: 0.196 [sec]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.178 MB perf.data (851 samples) ]
>
>
> $ sudo perf report --stat | grep -A1 offcpu
> offcpu-time stats:
> SAMPLE events: 851
>
> The benchmark passes messages by read/write and it creates off-cpu
> events. With 400 processes, we can see more than 800 events.
>
> The child process tracking is also enabled when -p option is given.
> But -t option does NOT as it only cares about the specific threads.
> It may be different what perf_event does now, but I think it makes
> more sense.
>
> You can get it from 'perf/offcpu-child-v1' branch in my tree
>
> https://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
>
> Thanks,
> Namhyung
>
>
> Namhyung Kim (6):
> perf offcpu: Fix a build failure on old kernels
> perf offcpu: Accept allowed sample types only
> perf offcpu: Check process id for the given workload
> perf offcpu: Parse process id separately
> perf offcpu: Track child processes
> perf offcpu: Update offcpu test for child process
>
> tools/perf/tests/shell/record_offcpu.sh | 57 ++++++++++++++++++++---
> tools/perf/util/bpf_off_cpu.c | 60 +++++++++++++++++++++++--
> tools/perf/util/bpf_skel/off_cpu.bpf.c | 58 +++++++++++++++++++++---
> tools/perf/util/evsel.c | 9 ++++
> tools/perf/util/off_cpu.h | 9 ++++
> 5 files changed, 176 insertions(+), 17 deletions(-)
>
>
> base-commit: 9886142c7a2226439c1e3f7d9b69f9c7094c3ef6
> --
> 2.37.0.rc0.161.g10f37bed90-goog
--
- Arnaldo
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCHSET 0/6] perf tools: A couple of fixes for perf record --off-cpu (v1)
2022-06-28 14:40 ` [PATCHSET 0/6] perf tools: A couple of fixes for perf record --off-cpu (v1) Arnaldo Carvalho de Melo
@ 2022-06-28 16:51 ` Namhyung Kim
0 siblings, 0 replies; 12+ messages in thread
From: Namhyung Kim @ 2022-06-28 16:51 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Jiri Olsa, Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers,
linux-perf-users, Song Liu, Hao Luo, Milian Wolff, bpf,
Blake Jones
Hi Arnaldo,
On Tue, Jun 28, 2022 at 7:40 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> Em Fri, Jun 24, 2022 at 04:13:07PM -0700, Namhyung Kim escreveu:
> > Hello,
> >
> > The first patch fixes a build error on old kernels which has
> > task_struct->state field that is renamed to __state. Actually I made
> > a mistake when I wrote the code and assumed new kernel version.
> >
> > The second patch is to prevent invalid sample synthesize by
> > disallowing unsupported sample types.
>
> So I'll pick the first two for perf/urgent and then when that is merged
> into perf/core pick the rest, ok?
Sounds good!
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 12+ messages in thread