linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: "Jiri Olsa" <jolsa@kernel.org>, "Ian Rogers" <irogers@google.com>,
	"Greg KH" <gregkh@linuxfoundation.org>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	linux-fsdevel@vger.kernel.org, brauner@kernel.org,
	viro@zeniv.linux.org.uk, akpm@linux-foundation.org,
	linux-kernel@vger.kernel.org, bpf@vger.kernel.org,
	linux-mm@kvack.org, "Daniel Müller" <deso@posteo.net>,
	"linux-perf-use." <linux-perf-users@vger.kernel.org>
Subject: Re: [PATCH 2/5] fs/procfs: implement efficient VMA querying API for /proc/<pid>/maps
Date: Mon, 6 May 2024 11:41:43 -0700	[thread overview]
Message-ID: <CAEf4BzZJPY0tfLtvFA4BpQr71wO7iz-1-q16cENOAbuT1EX_og@mail.gmail.com> (raw)
In-Reply-To: <ZjjiFnNRbwsMJ3Gj@x1>

On Mon, May 6, 2024 at 6:58 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
>
> On Sat, May 04, 2024 at 02:50:31PM -0700, Andrii Nakryiko wrote:
> > On Sat, May 4, 2024 at 8:28 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> > > On Fri, May 03, 2024 at 05:30:03PM -0700, Andrii Nakryiko wrote:
> > > > Note also, that fetching VMA name (e.g., backing file path, or special
> > > > hard-coded or user-provided names) is optional just like build ID. If
> > > > user sets vma_name_size to zero, kernel code won't attempt to retrieve
> > > > it, saving resources.
>
> > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>
> > > Where is the userspace code that uses this new api you have created?
>
> > So I added a faithful comparison of existing /proc/<pid>/maps vs new
> > ioctl() API to solve a common problem (as described above) in patch
> > #5. The plan is to put it in mentioned blazesym library at the very
> > least.
> >
> > I'm sure perf would benefit from this as well (cc'ed Arnaldo and
> > linux-perf-user), as they need to do stack symbolization as well.
>
> At some point, when BPF iterators became a thing we thought about, IIRC
> Jiri did some experimentation, but I lost track, of using BPF to
> synthesize PERF_RECORD_MMAP2 records for pre-existing maps, the layout
> as in uapi/linux/perf_event.h:
>
>         /*
>          * The MMAP2 records are an augmented version of MMAP, they add
>          * maj, min, ino numbers to be used to uniquely identify each mapping
>          *
>          * struct {
>          *      struct perf_event_header        header;
>          *
>          *      u32                             pid, tid;
>          *      u64                             addr;
>          *      u64                             len;
>          *      u64                             pgoff;
>          *      union {
>          *              struct {
>          *                      u32             maj;
>          *                      u32             min;
>          *                      u64             ino;
>          *                      u64             ino_generation;
>          *              };
>          *              struct {
>          *                      u8              build_id_size;
>          *                      u8              __reserved_1;
>          *                      u16             __reserved_2;
>          *                      u8              build_id[20];
>          *              };
>          *      };
>          *      u32                             prot, flags;
>          *      char                            filename[];
>          *      struct sample_id                sample_id;
>          * };
>          */
>         PERF_RECORD_MMAP2                       = 10,
>
>  *   PERF_RECORD_MISC_MMAP_BUILD_ID      - PERF_RECORD_MMAP2 event
>
> As perf.data files can be used for many purposes we want them all, so we

ok, so because you want them all and you don't know which VMAs will be
useful or not, it's a different problem. BPF iterators will be faster
purely due to avoiding binary -> text -> binary conversion path, but
other than that you'll still retrieve all VMAs.

You can still do the same full VMA iteration with this new API, of
course, but advantages are probably smaller as you'll be retrieving a
full set of VMAs regardless (though it would be interesting to compare
anyways).

> setup a meta data perf file descriptor to go on receiving the new mmaps
> while we read /proc/<pid>/maps, to reduce the chance of missing maps, do
> it in parallel, etc:
>
> ⬢[acme@toolbox perf-tools-next]$ perf record -h 'event synthesis'
>
>  Usage: perf record [<options>] [<command>]
>     or: perf record [<options>] -- <command> [<options>]
>
>         --num-thread-synthesize <n>
>                           number of threads to run for event synthesis
>         --synth <no|all|task|mmap|cgroup>
>                           Fine-tune event synthesis: default=all
>
> ⬢[acme@toolbox perf-tools-next]$
>
> For this specific initial synthesis of everything the plan, as mentioned
> about Jiri's experiments, was to use a BPF iterator to just feed the
> perf ring buffer with those events, that way userspace would just
> receive the usual records it gets when a new mmap is put in place, the
> BPF iterator would just feed the preexisting mmaps, as instructed via
> the perf_event_attr for the perf_event_open syscall.
>
> For people not wanting BPF, i.e. disabling it altogether in perf or
> disabling just BPF skels, then we would fallback to the current method,
> or to the one being discussed here when it becomes available.
>
> One thing to have in mind is for this iterator not to generate duplicate
> records for non-pre-existing mmaps, i.e. we would need some generation
> number that would be bumped when asking for such pre-existing maps
> PERF_RECORD_MMAP2 dumps.

Looking briefly at struct vm_area_struct, it doesn't seems like the
kernel maintains any sort of generation (at least not at
vm_area_struct level), so this would be nice to have, I'm sure, but
isn't really related to adding this API. Once the kernel does have
this "VMA generation" counter, it can be trivially added to this
binary interface (which can't be said about /proc/<pid>/maps,
unfortunately).

>
> > It will be up to other similar projects to adopt this, but we'll
> > definitely get this into blazesym as it is actually a problem for the
>
> At some point looking at plugging blazesym somehow with perf may be
> something to consider, indeed.

In the above I meant direct use of this new API in perf code itself,
but yes, blazesym is a generic library for symbolization that handles
ELF/DWARF/GSYM (and I believe more formats), so it indeed might make
sense to use it.

>
> - Arnaldo
>
> > abovementioned Oculus use case. We already had to make a tradeoff (see
> > [2], this wasn't done just because we could, but it was requested by
> > Oculus customers) to cache the contents of /proc/<pid>/maps and run
> > the risk of missing some shared libraries that can be loaded later. It
> > would be great to not have to do this tradeoff, which this new API
> > would enable.
> >
> >   [2] https://github.com/libbpf/blazesym/commit/6b521314126b3ae6f2add43e93234b59fed48ccf
> >

[...]

  parent reply	other threads:[~2024-05-06 18:41 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-04  0:30 [PATCH 0/5] ioctl()-based API to query VMAs from /proc/<pid>/maps Andrii Nakryiko
2024-05-04  0:30 ` [PATCH 1/5] fs/procfs: extract logic for getting VMA name constituents Andrii Nakryiko
2024-05-04  0:30 ` [PATCH 2/5] fs/procfs: implement efficient VMA querying API for /proc/<pid>/maps Andrii Nakryiko
2024-05-04 15:28   ` Greg KH
2024-05-04 21:50     ` Andrii Nakryiko
2024-05-06 13:58       ` Arnaldo Carvalho de Melo
2024-05-06 18:05         ` Namhyung Kim
2024-05-06 18:51           ` Andrii Nakryiko
2024-05-06 18:53           ` Arnaldo Carvalho de Melo
2024-05-06 19:16             ` Arnaldo Carvalho de Melo
2024-05-07 21:55               ` Namhyung Kim
2024-05-06 18:41         ` Andrii Nakryiko [this message]
2024-05-06 20:35           ` Arnaldo Carvalho de Melo
2024-05-07 16:36             ` Andrii Nakryiko
2024-05-04 23:36   ` kernel test robot
2024-05-07 18:10   ` Liam R. Howlett
2024-05-07 18:52     ` Andrii Nakryiko
2024-05-04  0:30 ` [PATCH 3/5] tools: sync uapi/linux/fs.h header into tools subdir Andrii Nakryiko
2024-05-04  0:30 ` [PATCH 4/5] selftests/bpf: make use of PROCFS_PROCMAP_QUERY ioctl, if available Andrii Nakryiko
2024-05-04  0:30 ` [PATCH 5/5] selftests/bpf: a simple benchmark tool for /proc/<pid>/maps APIs Andrii Nakryiko
2024-05-04 15:29   ` Greg KH
2024-05-04 21:57     ` Andrii Nakryiko
2024-05-05  5:09       ` Ian Rogers
2024-05-06 18:32         ` Andrii Nakryiko
2024-05-06 18:43           ` Ian Rogers
2024-05-07  5:06             ` Andrii Nakryiko
2024-05-07 17:29               ` Andrii Nakryiko
2024-05-07 22:27                 ` Namhyung Kim
2024-05-07 22:56                   ` Andrii Nakryiko
2024-05-08  0:36                     ` Arnaldo Carvalho de Melo
2024-05-04 15:32   ` Greg KH
2024-05-04 22:13     ` Andrii Nakryiko
2024-05-07 15:48       ` Liam R. Howlett
2024-05-07 16:10         ` Matthew Wilcox
2024-05-07 16:18           ` Liam R. Howlett
2024-05-07 16:27         ` Andrii Nakryiko
2024-05-07 18:06           ` Liam R. Howlett
2024-05-07 19:00             ` Andrii Nakryiko
2024-05-08  1:20               ` Liam R. Howlett
2024-05-04 11:24 ` [PATCH 0/5] ioctl()-based API to query VMAs from /proc/<pid>/maps Christian Brauner
2024-05-04 15:33   ` Greg KH
2024-05-04 21:50     ` Andrii Nakryiko
2024-05-04 21:50   ` Andrii Nakryiko
2024-05-05  5:26 ` Ian Rogers
2024-05-06 18:58   ` Andrii Nakryiko
2024-05-04 18:37 [PATCH 2/5] fs/procfs: implement efficient VMA querying API for /proc/<pid>/maps Alexey Dobriyan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEf4BzZJPY0tfLtvFA4BpQr71wO7iz-1-q16cENOAbuT1EX_og@mail.gmail.com \
    --to=andrii.nakryiko@gmail.com \
    --cc=acme@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brauner@kernel.org \
    --cc=deso@posteo.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=irogers@google.com \
    --cc=jolsa@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).