netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Gilad Reti <gilad.reti@gmail.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Andrii Nakryiko <andrii@kernel.org>, bpf <bpf@vger.kernel.org>,
	Networking <netdev@vger.kernel.org>,
	Alexei Starovoitov <ast@fb.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH bpf-next 3/3] selftests/bpf: add tests for user- and non-CO-RE BPF_CORE_READ() variants
Date: Wed, 6 Jan 2021 15:25:30 -0800	[thread overview]
Message-ID: <CAEf4BzZay-ofoZ-RURa0vTyQnEVaqF4_xuTAijSA9wgm=kt02g@mail.gmail.com> (raw)
In-Reply-To: <CANaYP3E5qYq0JznOkxVf6r3N-oM-WjKEw6kqPKD_ofQtk1gL+A@mail.gmail.com>

On Wed, Jan 6, 2021 at 2:10 AM Gilad Reti <gilad.reti@gmail.com> wrote:
>
> On Wed, Jan 6, 2021 at 8:09 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Jan 05, 2021 at 01:03:47PM -0800, Andrii Nakryiko wrote:
> > > > >
> > > > >   - handling 32-bit vs 64-bit UAPI structs uniformly;
> > > >
> > > > what do you mean?
> > > > 32-bit user space running on 64-bit kernel works through 'compat' syscalls.
> > > > If bpf progs are accessing 64-bit uapi structs in such case they're broken
> > > > and no amount of co-re can help.
> > >
> > > I know nothing about compat, so can't comment on that. But the way I
> > > understood the situation was the BPF program compiled once (well, at
> > > least from the unmodified source code), but runs on ARM32 and (on a
> > > separate physical host) on ARM64. And it's task is, say, to read UAPI
> > > kernel structures from syscall arguments.
> >
> > I'm not sure about arm, but on x86 there should be two different progs
> > for this to work (if we're talking about 32-bit userspace).
> > See below.
> >
> > > One simple example I found in UAPI definitions is struct timespec, it
> > > seems it's defined with `long`:
> > >
> > > struct timespec {
> > >         __kernel_old_time_t     tv_sec;         /* seconds */
> > >         long                    tv_nsec;        /* nanoseconds */
> > > }
> > >
> > > So if you were to trace clock_gettime(), you'd need to deal with
> > > differently-sized reads of tv_nsec, depending on whether you are
> > > running on the 32-bit or 64-bit host.
> >
> > I believe gettime is vdso, so that's not a syscall and there are no bpf hooks
> > available to track that.
> > For normal syscall with timespec argument like sys_recvmmsg and sys_nanosleep
> > the user needs to load two programs and attach to two different points:
> > fentry/__x64_sys_nanosleep and fentry/__ia32_sys_nanosleep.
> > (I'm not an expert on compat and not quite sure how _time32 suffix is handled,
> > so may be 4 different progs are needed).
> > The point is that there are several different entry points with different args.
> > ia32 user space will call into the kernel differently.
> > At the end different pieces of the syscall and compat handling will call
> > into hrtimer_nanosleep with ktime.
> > So if one bpf prog needs to work for 32 and 64 bit user space it should be
> > attached to common piece of code like hrtimer_nanosleep().
> > If it's attached to syscall entry it needs to attach to at least 2 different
> > entry points with 2 different progs.
>
> I think that while it is true that syscall handlers are different for
> 32 and 64 bit,
> it may happen that some syscall handlers will pass user-space pointers down to
> shared common functions, which may be hooked by ebpf probes. I am no expert, but
> I think that the cp_new_stat function is such an example; it gets a
> user struct stat* pointer,
> whose definition varies across different architectures (32/64-bit,
> different processor archs etc.)
>
> > I guess it's possible to load single prog and kprobe-attach it to
> > two different kernel functions, but code is kinda different.
> > For the simplest things like sys_nanosleep it might work and timespec
> > is simple enough structure to handle with sizeof(long) co-re tricks,
> > but it's not a generally applicable approach.
> > So for a tool to track 32-bit and 64-bit user space is quite tricky.
> >
> > If bpf prog doesn't care about user space and only needs to deal
> > with 64-bit kernel + 64-bit user space and 32-bit kernel + 32-bit user space
> > then it's a bit simpler, but probably still requires attaching
> > to two different points. On 32-bit x86 kernel there will be no
> > "fentry/__x64_sys_nanosleep" function.
> >
> > > There are probably other examples where UAPI structs use long instead
> > > of __u32 or __u64, but I didn't dig too deep.
> >
> > There is also crazy difference between compact_ioctl vs ioctl.
> >
> > The point I'm trying to get across is that the clean 32-bit processing is
> > a lot more involved than just CORE_READ_USER macro and I want to make sure that
> > the expections are set correctly.
> > BPF CO-RE prog may handle 32-bit and 64-bit at the same time, but
> > it's probably exception than the rule.
> >
> > > Let's see if Gilad can provide his perspective. I have no strong
> > > feelings about this and can send a patch removing CORE_READ_USER
> > > variants (they didn't make it into libbpf v0.3, so no harm or API
> > > stability concerns). BPF_PROBE_READ() and BPF_PROBE_READ_USER() are
> > > still useful for reading non-relocatable, but nested data structures,
> > > so I'd prefer to keep those.
> >
> > Let's hear from Gilad.
> > I'm not against BPF_CORE_READ_USER macro as-is. I was objecting
> > to the reasons of implementing it.
>
> If I am not mistaken (which is completely possible), I think that
> providing such a macro will
> not cause any more confusion than the bpf_probe_read_{,user}
> distinction already does,
> since BPF_CORE_READ_USER to BPF_CORE_READ is the same as bpf_probe_read_user
> to bpf_probe_read.

I think the biggest source of confusion is that USER part in
BPF_CORE_READ_USER refers to reading data from user address space, not
really user structs (which is kind of natural instinct here). CO-RE
*always* works only with kernel types, which is obvious if you have a
lot of experience with using CO-RE, but not initially, unfortunately.

So in the end, while a narrow use case, it seems like it might be
useful to have BPF_CORE_READ_USER. Maybe emphasizing (in the doc
comment) the fact that all types need to be kernel types would help a
bit? If not, it's not a big deal, because it's pretty easy for users
to just explicitly write bpf_probe_read_user(&dst, sizeof(dst),
__builtin_preserve_access_index(src)). But there is definitely a bit
of mental satisfaction with having all possible combinations of
CORE/non-CORE and USER/KERNEL variants :)

Btw, it seems these patches are already in bpf-next, so, Alexei,
please let me know if you insist on removing BPF_CORE_READ_USER()
variant and I'll send a follow up patch.

  reply	other threads:[~2021-01-06 23:26 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-18 23:56 [PATCH bpf-next 0/3] Add user-space and non-CO-RE variants of BPF_CORE_READ() Andrii Nakryiko
2020-12-18 23:56 ` [PATCH bpf-next 1/3] libbpf: add user-space variants of BPF_CORE_READ() family of macros Andrii Nakryiko
2020-12-18 23:56 ` [PATCH bpf-next 2/3] libbpf: add non-CO-RE variants of BPF_CORE_READ() macro family Andrii Nakryiko
2020-12-18 23:56 ` [PATCH bpf-next 3/3] selftests/bpf: add tests for user- and non-CO-RE BPF_CORE_READ() variants Andrii Nakryiko
2021-01-05  3:46   ` Alexei Starovoitov
2021-01-05  5:08     ` Andrii Nakryiko
2021-01-05 19:03       ` Alexei Starovoitov
2021-01-05 21:03         ` Andrii Nakryiko
2021-01-06  6:09           ` Alexei Starovoitov
2021-01-06 10:10             ` Gilad Reti
2021-01-06 23:25               ` Andrii Nakryiko [this message]
2021-01-07 20:00                 ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEf4BzZay-ofoZ-RURa0vTyQnEVaqF4_xuTAijSA9wgm=kt02g@mail.gmail.com' \
    --to=andrii.nakryiko@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@fb.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=gilad.reti@gmail.com \
    --cc=kernel-team@fb.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).