All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>, bpf <bpf@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Kernel Team <kernel-team@fb.com>,
	Ilya Leoshkevich <iii@linux.ibm.com>,
	Kenta Tada <kenta.tada@sony.com>,
	Hengqi Chen <hengqi.chen@gmail.com>
Subject: Re: [PATCH RFC bpf-next 2/3] libbpf: add ksyscall/kretsyscall sections support for syscall kprobes
Date: Thu, 7 Jul 2022 12:36:33 -0700	[thread overview]
Message-ID: <CAADnVQKm5otAmeoseC9=x7Co91CgY1oYpHFR1R+pRSB+X_mYFQ@mail.gmail.com> (raw)
In-Reply-To: <CAEf4BzaXBD86k8BYv7q4fFeyHALHcVUCbSpSG4=kfC0orydrCQ@mail.gmail.com>

On Thu, Jul 7, 2022 at 12:10 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Jul 7, 2022 at 10:23 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Wed, Jul 6, 2022 at 5:41 PM Andrii Nakryiko <andrii@kernel.org> wrote:
> > >
> > > Add SEC("ksyscall")/SEC("ksyscall/<syscall_name>") and corresponding
> > > kretsyscall variants (for return kprobes) to allow users to kprobe
> > > syscall functions in kernel. These special sections allow to ignore
> > > complexities and differences between kernel versions and host
> > > architectures when it comes to syscall wrapper and corresponding
> > > __<arch>_sys_<syscall> vs __se_sys_<syscall> differences, depending on
> > > CONFIG_ARCH_HAS_SYSCALL_WRAPPER.
> > >
> > > Combined with the use of BPF_KSYSCALL() macro, this allows to just
> > > specify intended syscall name and expected input arguments and leave
> > > dealing with all the variations to libbpf.
> > >
> > > In addition to SEC("ksyscall+") and SEC("kretsyscall+") add
> > > bpf_program__attach_ksyscall() API which allows to specify syscall name
> > > at runtime and provide associated BPF cookie value.
> > >
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > ---
> > >  tools/lib/bpf/libbpf.c          | 109 ++++++++++++++++++++++++++++++++
> > >  tools/lib/bpf/libbpf.h          |  16 +++++
> > >  tools/lib/bpf/libbpf.map        |   1 +
> > >  tools/lib/bpf/libbpf_internal.h |   2 +
> > >  4 files changed, 128 insertions(+)
> > >
> > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > > index cb49408eb298..4749fb84e33d 100644
> > > --- a/tools/lib/bpf/libbpf.c
> > > +++ b/tools/lib/bpf/libbpf.c
> > > @@ -4654,6 +4654,65 @@ static int probe_kern_btf_enum64(void)
> > >                                              strs, sizeof(strs)));
> > >  }
> > >
> > > +static const char *arch_specific_syscall_pfx(void)
> > > +{
> > > +#if defined(__x86_64__)
> > > +       return "x64";
> > > +#elif defined(__i386__)
> > > +       return "ia32";
> > > +#elif defined(__s390x__)
> > > +       return "s390x";
> > > +#elif defined(__s390__)
> > > +       return "s390";
> > > +#elif defined(__arm__)
> > > +       return "arm";
> > > +#elif defined(__aarch64__)
> > > +       return "arm64";
> > > +#elif defined(__mips__)
> > > +       return "mips";
> > > +#elif defined(__riscv)
> > > +       return "riscv";
> > > +#else
> > > +       return NULL;
> > > +#endif
> > > +}
> > > +
> > > +static int probe_kern_syscall_wrapper(void)
> > > +{
> > > +       /* available_filter_functions is a few times smaller than
> > > +        * /proc/kallsyms and has simpler format, so we use it as a faster way
> > > +        * to check that __<arch>_sys_bpf symbol exists, which is a sign that
> > > +        * kernel was built with CONFIG_ARCH_HAS_SYSCALL_WRAPPER and uses
> > > +        * syscall wrappers
> > > +        */
> > > +       static const char *kprobes_file = "/sys/kernel/tracing/available_filter_functions";
> > > +       char func_name[128], syscall_name[128];
> > > +       const char *ksys_pfx;
> > > +       FILE *f;
> > > +       int cnt;
> > > +
> > > +       ksys_pfx = arch_specific_syscall_pfx();
> > > +       if (!ksys_pfx)
> > > +               return 0;
> > > +
> > > +       f = fopen(kprobes_file, "r");
> > > +       if (!f)
> > > +               return 0;
> > > +
> > > +       snprintf(syscall_name, sizeof(syscall_name), "__%s_sys_bpf", ksys_pfx);
> > > +
> > > +       /* check if bpf() syscall wrapper is listed as possible kprobe */
> > > +       while ((cnt = fscanf(f, "%127s%*[^\n]\n", func_name)) == 1) {
> > > +               if (strcmp(func_name, syscall_name) == 0) {
> > > +                       fclose(f);
> > > +                       return 1;
> > > +               }
> > > +       }
> >
> > Maybe we should do the other way around ?
> > cat /proc/kallsyms |grep sys_bpf
> >
> > and figure out the prefix from there?
> > Then we won't need to do giant
> > #if defined(__x86_64__)
> > ...
> >
>
> Unfortunately this won't work well due to compat and 32-bit APIs (and
> bpf() syscall is particularly bad with also bpf_sys_bpf):
>
> $ sudo cat /proc/kallsyms| rg '_sys_bpf$'
> ffffffff811cb100 t __sys_bpf
> ffffffff811cd380 T bpf_sys_bpf
> ffffffff811cd520 T __x64_sys_bpf
> ffffffff811cd540 T __ia32_sys_bpf
> ffffffff8256fce0 r __ksymtab_bpf_sys_bpf
> ffffffff8259b5a2 r __kstrtabns_bpf_sys_bpf
> ffffffff8259bab9 r __kstrtab_bpf_sys_bpf
> ffffffff83abc400 t _eil_addr___ia32_sys_bpf
> ffffffff83abc410 t _eil_addr___x64_sys_bpf

That actually means that the current and proposed approaches
are both somewhat wrong, since they don't attach to both.
Meaning all syscalls done by 32-bit userspace will not be seen
by bpf prog.

Probably libbpf should attach to both:
__x64_sys_bpf and __ia32_sys_bpf.

__ksym and __kstr are easy to filter out, since they are
standard prefixes.
No idea what eil_addr is.

> $ sudo cat /proc/kallsyms| rg '_sys_mmap$'
> ffffffff81024480 T __x64_sys_mmap
> ffffffff810244c0 T __ia32_sys_mmap
> ffffffff83abae30 t _eil_addr___ia32_sys_mmap
> ffffffff83abae40 t _eil_addr___x64_sys_mmap
>
> We have similar arch-specific switches in few other places (USDT and
> lib path detection, for example), so it's not a new precedent (for
> better or worse).
>
>
> > /proc/kallsyms has world read permissions:
> > proc_create("kallsyms", 0444, NULL, &kallsyms_proc_ops);
> > unlike available_filter_functions.
> >
> > Also tracefs might be mounted in a different dir than
> > /sys/kernel/tracing/
> > like
> > /sys/kernel/debug/tracing/
>
> Yeah, good point, was trying to avoid parsing more expensive kallsyms,
> but given it's done once, it might not be a big deal.

Soon we'll have an iterator for them so doing in-kernel
search of sys_bpf would be fast.

  reply	other threads:[~2022-07-07 19:36 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-07  0:41 [PATCH RFC bpf-next 0/3] libbpf: add better syscall kprobing support Andrii Nakryiko
2022-07-07  0:41 ` [PATCH RFC bpf-next 1/3] libbpf: improve and rename BPF_KPROBE_SYSCALL Andrii Nakryiko
2022-07-07  0:41 ` [PATCH RFC bpf-next 2/3] libbpf: add ksyscall/kretsyscall sections support for syscall kprobes Andrii Nakryiko
2022-07-07 17:23   ` Alexei Starovoitov
2022-07-07 19:10     ` Andrii Nakryiko
2022-07-07 19:36       ` Alexei Starovoitov [this message]
2022-07-07 20:50         ` Andrii Nakryiko
2022-07-08 11:28       ` Jiri Olsa
2022-07-08 22:05         ` Andrii Nakryiko
2022-07-10  0:38           ` Alexei Starovoitov
2022-07-11 16:28             ` Andrii Nakryiko
2022-07-12  4:20               ` Alexei Starovoitov
2022-07-12  5:00                 ` Andrii Nakryiko
2022-07-08  9:35   ` Jiri Olsa
2022-07-08 22:04     ` Andrii Nakryiko
2022-07-07  0:41 ` [PATCH RFC bpf-next 3/3] selftests/bpf: use BPF_KSYSCALL and SEC("ksyscall") in selftests Andrii Nakryiko
2022-07-07  8:28 ` [PATCH RFC bpf-next 0/3] libbpf: add better syscall kprobing support Yaniv Agman
2022-07-07 20:56   ` Andrii Nakryiko
2022-07-11 16:23     ` Andrii Nakryiko
2022-07-07 15:51 ` Ilya Leoshkevich
2022-07-07 20:59   ` Andrii Nakryiko
2022-07-11 18:25     ` Ilya Leoshkevich
2022-07-12  4:24       ` Andrii Nakryiko
2022-07-13  7:12         ` Alan Maguire
2022-07-13 17:52           ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAADnVQKm5otAmeoseC9=x7Co91CgY1oYpHFR1R+pRSB+X_mYFQ@mail.gmail.com' \
    --to=alexei.starovoitov@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=hengqi.chen@gmail.com \
    --cc=iii@linux.ibm.com \
    --cc=kenta.tada@sony.com \
    --cc=kernel-team@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.