All of lore.kernel.org
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: Song Liu <song@kernel.org>, Andrii Nakryiko <andrii@kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>,
	Clark Williams <williams@redhat.com>,
	Kate Carcia <kcarcia@redhat.com>,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	Adrian Hunter <adrian.hunter@intel.com>,
	Changbin Du <changbin.du@huawei.com>, Hao Luo <haoluo@google.com>,
	Ian Rogers <irogers@google.com>,
	James Clark <james.clark@arm.com>,
	Kan Liang <kan.liang@linux.intel.com>,
	Roman Lozko <lozko.roma@gmail.com>,
	Stephane Eranian <eranian@google.com>,
	Thomas Richter <tmricht@linux.ibm.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	bpf <bpf@vger.kernel.org>
Subject: Re: BPF skels in perf .Re: [GIT PULL] perf tools changes for v6.4
Date: Thu, 4 May 2023 16:07:29 -0300	[thread overview]
Message-ID: <ZFQCccsx6GK+gY0j@kernel.org> (raw)
In-Reply-To: <CAEf4BzaUU9vZU6R_020ru5ct0wh-p1M3ZFet-vYqcHvb9bW1Cw@mail.gmail.com>

Em Thu, May 04, 2023 at 11:50:07AM -0700, Andrii Nakryiko escreveu:
> On Thu, May 4, 2023 at 10:52 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > Andrii, can you add some more information about the usage of vmlinux.h
> > instead of using kernel headers?
 
> I'll just say that vmlinux.h is not a hard requirement to build BPF
> programs, it's more a convenience allowing easy access to definitions
> of both UAPI and kernel-internal structures for tracing needs and
> marking them relocatable using BPF CO-RE machinery. Lots of real-world
> applications just check-in pregenerated vmlinux.h to avoid build-time
> dependency on up-to-date host kernel and such.
 
> If vmlinux.h generation and usage is causing issues, though, given
> that perf's BPF programs don't seem to be using many different kernel
> types, it might be a better option to just use UAPI headers for public
> kernel type definitions, and just define CO-RE-relocatable minimal
> definitions locally in perf's BPF code for the other types necessary.
> E.g., if perf needs only pid and tgid from task_struct, this would
> suffice:
 
> struct task_struct {
>     int pid;
>     int tgid;
> } __attribute__((preserve_access_index));

Yeah, that seems like a way better approach, no vmlinux involved, libbpf
CO-RE notices that task_struct changed from this two integers version
(of course) and does the relocation to where it is in the running kernel
by using /sys/kernel/btf/vmlinux.

I looked and the creation of vmlinux.h was introduced in:

commit 944138f048f7d7591ec7568c94b21de8df2724d4
Author: Namhyung Kim <namhyung@kernel.org>
Date:   Thu Jul 1 14:12:27 2021 -0700

    perf stat: Enable BPF counter with --for-each-cgroup

    Recently bperf was added to use BPF to count perf events for various
    purposes.  This is an extension for the approach and targetting to
    cgroup usages.

    Unlike the other bperf, it doesn't share the events with other
    processes but it'd reduces unnecessary events (and the overhead of
    multiplexing) for each monitored cgroup within the perf session.

    When --for-each-cgroup is used with --bpf-counters, it will open
    cgroup-switches event per cpu internally and attach the new BPF
    program to read given perf_events and to aggregate the results for
    cgroups.  It's only called when task is switched to a task in a
    different cgroup.

    Signed-off-by: Namhyung Kim <namhyung@kernel.org>
    Acked-by: Song Liu <songliubraving@fb.com>
    Cc: Andi Kleen <ak@linux.intel.com>
    Cc: Ian Rogers <irogers@google.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Stephane Eranian <eranian@google.com>
    Link: http://lore.kernel.org/lkml/20210701211227.1403788-1-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Which I think was the first BPF skel to access a kernel data structure,
yeah:

tools/perf/util/bpf_skel/bperf_cgroup.bpf.c

For things like:

+static inline int get_cgroup_v1_idx(__u32 *cgrps, int size)
+{
+       struct task_struct *p = (void *)bpf_get_current_task();
+       struct cgroup *cgrp;
+       register int i = 0;
+       __u32 *elem;
+       int level;
+       int cnt;
+
+       cgrp = BPF_CORE_READ(p, cgroups, subsys[perf_event_cgrp_id], cgroup);
+       level = BPF_CORE_READ(cgrp, level);

So we can completely remove touching vmlinux from the perf building
process.

If we can get the revert of the patches making BPF skels to build by
default for v6.4 then we would do this work, test it thorougly and have
it available for v6.5.

Linus, would that be a way forward?

- Arnaldo

For reference, here is the definition for BPF_CORE_READ() from tools/lib/bpf/bpf_core_read.h

/*
 * BPF_CORE_READ() is used to simplify BPF CO-RE relocatable read, especially
 * when there are few pointer chasing steps.
 * E.g., what in non-BPF world (or in BPF w/ BCC) would be something like:
 *      int x = s->a.b.c->d.e->f->g;
 * can be succinctly achieved using BPF_CORE_READ as:
 *      int x = BPF_CORE_READ(s, a.b.c, d.e, f, g);
 *
 * BPF_CORE_READ will decompose above statement into 4 bpf_core_read (BPF
 * CO-RE relocatable bpf_probe_read_kernel() wrapper) calls, logically
 * equivalent to:
 * 1. const void *__t = s->a.b.c;
 * 2. __t = __t->d.e;
 * 3. __t = __t->f;
 * 4. return __t->g;
 *
 * Equivalence is logical, because there is a heavy type casting/preservation
 * involved, as well as all the reads are happening through
 * bpf_probe_read_kernel() calls using __builtin_preserve_access_index() to
 * emit CO-RE relocations.
 *
 * N.B. Only up to 9 "field accessors" are supported, which should be more
 * than enough for any practical purpose.
 */
#define BPF_CORE_READ(src, a, ...) ({                                       \
        ___type((src), a, ##__VA_ARGS__) __r;                               \
        BPF_CORE_READ_INTO(&__r, (src), a, ##__VA_ARGS__);                  \
        __r;                                                                \
})

  reply	other threads:[~2023-05-04 19:10 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-03 21:18 [GIT PULL] perf tools changes for v6.4 Arnaldo Carvalho de Melo
2023-05-04  3:00 ` Linus Torvalds
2023-05-04  3:12   ` Linus Torvalds
2023-05-04  5:51     ` Ian Rogers
2023-05-04 18:36       ` Jiri Olsa
2023-05-04 11:09     ` BPF skels in perf .Re: " Arnaldo Carvalho de Melo
2023-05-04 17:25       ` Linus Torvalds
2023-05-04 17:52         ` Arnaldo Carvalho de Melo
2023-05-04 18:50           ` Andrii Nakryiko
2023-05-04 19:07             ` Arnaldo Carvalho de Melo [this message]
2023-05-04 21:48               ` Arnaldo Carvalho de Melo
2023-05-04 22:01                 ` Arnaldo Carvalho de Melo
2023-05-05 13:18                   ` Arnaldo Carvalho de Melo
2023-05-06  1:13                     ` Yang Jihong
2023-05-05 13:20                   ` Arnaldo Carvalho de Melo
2023-05-04 22:03                 ` Ian Rogers
2023-05-04 23:03                   ` Jiri Olsa
2023-05-04 23:15                     ` Namhyung Kim
2023-05-05  9:36                       ` Jiri Olsa
2023-05-04 23:19                     ` Ian Rogers
2023-05-05  9:39                       ` Jiri Olsa
2023-05-05 11:42                         ` Jiri Olsa
2023-05-05 13:33                     ` Arnaldo Carvalho de Melo
2023-05-05 15:14                       ` Alexei Starovoitov
2023-05-05 16:56                       ` [PATCH RFC/RFT] perf bpf skels: Stop using vmlinux.h generated from BTF, use subset of used structs + CO-RE. was " Arnaldo Carvalho de Melo
2023-05-05 17:04                         ` Ian Rogers
2023-05-05 20:43                           ` Jiri Olsa
2023-05-05 20:46                             ` Ian Rogers
2023-05-05 20:48                               ` Namhyung Kim
2023-05-10 18:56                                 ` Arnaldo Carvalho de Melo
2023-05-05 20:49                               ` Arnaldo Carvalho de Melo
2023-05-05 21:15                               ` Jiri Olsa
2023-05-05 21:21                                 ` Andrii Nakryiko
2023-05-05 21:52                                   ` Arnaldo Carvalho de Melo
2023-05-05 21:55                                     ` Andrii Nakryiko
2023-05-05 21:33                             ` Arnaldo Carvalho de Melo
2023-05-08 21:53                         ` Ian Rogers
2023-05-04 22:46                 ` Namhyung Kim
2023-05-07 19:15 ` pr-tracker-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZFQCccsx6GK+gY0j@kernel.org \
    --to=acme@kernel.org \
    --cc=acme@redhat.com \
    --cc=adrian.hunter@intel.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=changbin.du@huawei.com \
    --cc=eranian@google.com \
    --cc=haoluo@google.com \
    --cc=irogers@google.com \
    --cc=james.clark@arm.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=kcarcia@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=lozko.roma@gmail.com \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    --cc=song@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tmricht@linux.ibm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=williams@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.