All of lore.kernel.org
 help / color / mirror / Atom feed
From: Luka Oreskovic <luka.oreskovic@sartura.hr>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Luka Perkov <luka.perkov@sartura.hr>,
	Juraj Vijtiuk <juraj.vijtiuk@sartura.hr>
Subject: Re: Problems with pointer offsets on ARM32
Date: Tue, 15 Sep 2020 09:26:19 +0200	[thread overview]
Message-ID: <CA+XBgLWa7nWnQNTUdqgBK2E34PH8mUc_wUWR=_iM2Yjr=gxrVw@mail.gmail.com> (raw)
In-Reply-To: <CAEf4Bzb4JrfmENs197d30xU2fnWwu9_1rq-=n9szaWmmxaSckg@mail.gmail.com>

On Mon, Sep 14, 2020 at 7:49 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Sep 14, 2020 at 12:55 AM Luka Oreskovic
> <luka.oreskovic@sartura.hr> wrote:
> >
> > On Fri, Sep 11, 2020 at 8:14 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Fri, Sep 11, 2020 at 9:45 AM Luka Oreskovic
> > > <luka.oreskovic@sartura.hr> wrote:
> > > >
> > > > Greetings everyone,
> > > >
> > > > I have been testing various BPF programs on the ARM32 architecture and
> > > > have encountered a strange error.
> > > >
> > > > When trying to run a simple program that prints out the arguments of
> > > > the open syscall,
> > > > I found some strange behaviour with the pointer offsets when accessing
> > > > the arguments:
> > > > The output of llvm-objdump differed from the verifier error dump log.
> > > > Notice the differences in lines 0 and 1. Why is the bytecode being
> > > > altered at runtime?
> > > >
> > > > I attached the program, the llvm-objdump result and the verifier dump below.
> > > >
> > > > Best wishes,
> > > > Luka Oreskovic
> > > >
> > > > BPF program
> > > > --------------------------------------------
> > > > #include "vmlinux.h"
> > > > #include <bpf/bpf_helpers.h>
> > > >
> > > > SEC("tracepoint/syscalls/sys_enter_open")
> > > > int tracepoint__syscalls__sys_enter_open(struct trace_event_raw_sys_enter* ctx)
> > > > {
> > > >         const char *arg1 = (const char *)ctx->args[0];
> > > >         int arg2 = (int)ctx->args[1];
> > > >
> > > >         bpf_printk("Open arg 1: %s\n", arg1);
> > > >         bpf_printk("Open arg 2: %d\n", arg2);
> > > >
> > > >         return 0;
> > > > }
> > > >
> > > > char LICENSE[] SEC("license") = "GPL";
> > > >
> > > >
> > > > llvm-objdump of program
> > > > --------------------------------------------
> > > > Disassembly of section tracepoint/syscalls/sys_enter_open:
> > > >
> > > > 0000000000000000 tracepoint__syscalls__sys_enter_open:
> > > > ;       int arg2 = (int)ctx->args[1];
> > > >        0:       79 16 18 00 00 00 00 00 r6 = *(u64 *)(r1 + 24)
> > > > ;       const char *arg1 = (const char *)ctx->args[0];
> > > >        1:       79 13 10 00 00 00 00 00 r3 = *(u64 *)(r1 + 16)
> > > >        2:       18 01 00 00 20 31 3a 20 00 00 00 00 25 73 0a 00 r1 =
> > > > 2941353058775328 ll
> > > > ;       bpf_printk("Open arg 1: %s\n", arg1);
> > > >        4:       7b 1a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r1
> > > >        5:       18 07 00 00 4f 70 65 6e 00 00 00 00 20 61 72 67 r7 =
> > > > 7454127125170581583 ll
> > > >        7:       7b 7a f0 ff 00 00 00 00 *(u64 *)(r10 - 16) = r7
> > > >        8:       bf a1 00 00 00 00 00 00 r1 = r10
> > > >        9:       07 01 00 00 f0 ff ff ff r1 += -16
> > > >       10:       b7 02 00 00 10 00 00 00 r2 = 16
> > > >       11:       85 00 00 00 06 00 00 00 call 6
> > > >       12:       18 01 00 00 20 32 3a 20 00 00 00 00 25 64 0a 00 r1 =
> > > > 2924860384358944 ll
> > > > ;       bpf_printk("Open arg 2: %d\n", arg2);
> > > >       14:       7b 1a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r1
> > > >       15:       7b 7a f0 ff 00 00 00 00 *(u64 *)(r10 - 16) = r7
> > > >       16:       bf a1 00 00 00 00 00 00 r1 = r10
> > > >       17:       07 01 00 00 f0 ff ff ff r1 += -16
> > > >       18:       b7 02 00 00 10 00 00 00 r2 = 16
> > > >       19:       bf 63 00 00 00 00 00 00 r3 = r6
> > > >       20:       85 00 00 00 06 00 00 00 call 6
> > > > ;       return 0;
> > > >       21:       b7 00 00 00 00 00 00 00 r0 = 0
> > > >       22:       95 00 00 00 00 00 00 00 exit
> > > >
> > > >
> > > > verifier output when running program
> > > > --------------------------------------------
> > > > libbpf: -- BEGIN DUMP LOG ---
> > > > libbpf:
> > > > Unrecognized arg#0 type PTR
> > > > ; int arg2 = (int)ctx->args[1];
> > > > 0: (79) r6 = *(u64 *)(r1 +16)
> > > > ; const char *arg1 = (const char *)ctx->args[0];
> > > > 1: (79) r3 = *(u64 *)(r1 +12)
> > > > invalid bpf_context access off=12 size=8
> > > > processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0
> > > > peak_states 0 mark_read 0
> > > >
> > > > libbpf: -- END LOG --
> > >
> > >
> > > One suspect would be libbpf's CO-RE relocations. Can you send full
> > > debug libbpf logs, it will have a full log of what libbpf adjusted.
> > > Please also include the definition of struct trace_event_raw_sys_enter
> > > from your vmlinux.h, as well as commit that your kernel was built from
> > > (to check the original definition).
> >
> >
> > Here is the data you requested. I can see the reallocations done by BPF CO-RE,
> > but I don't understand why they would have to be done in the first place
> > since I am using the vmlinux.h that has been generated using the
> > devices vmlinux.
> > Even if it made sense to change the pointer offsets, they shouldn't
> > break the program.
>
> Ok, there are few things at work here. I'll try to explain it and
> let's see how we can fix this. This is all due to 32-bit architecture,
> of course.
>
> 1. BPF "architecture" is strictly 64-bit one. This means that from the
> BPF program point of view pointers are 64-bit values, long is 64-bit
> as well.
> 2. look at struct trace_event_raw_sys_enter below: id and each element
> of args[] array are longs. What this means is the following:
>   - from BPF side of view those are assumed to be 64-bit integers
>   - but in reality in ARM32 kernel and in memory they are 32-bit integers.
>
> So the memory view of struct trace_event_raw_sys_enter differs quite a
> lot, depending in which context that struct definition is used. You
> can see that from the compiled BPF assembly output above and compare
> to the actual memory layout, taking into account that long is 4 bytes.
> For BPF args[0] is at 16 bytes, args[1] is at 24 bytes. But in reality
> it's at 12 byte offset for args[0] and 16 byte offset for args[1]. So
> what libbpf/CO-RE is doing is right, it actually relocates offset 16
> to 12 and 24 to 16. So without CO-RE you'd be accessing completely
> wrong offsets and reading garbage.
> 3. But read size matters as well. When compiling your BPF program
> clang sees 8-byte read and emits corresponding BPF assembly
> instruction. While in reality it has to be 4-byte read. Then later in
> the verifier (see tp_prog_is_valid_access() in
> kernel/trace/bpf_trace.c) there is a very simple check that all
> offsets are multiples of access size, and 12 % 8 != 0, so the verifier
> complains.
>
> It's a bit verbose explanation, but I hope it describes the problem.
> Basically, longs in a mixed BPF/32-bit land are error prone (as well
> as pointers in kernel data structures, btw).
>
> Now, a short-term fix for your use case would be to copy/paste struct
> trace_event_raw_sys_enter and replace all longs with ints. That will
> fix access size and offsets.
>
> Longer-term and a more proper fix is probably for libbpf to be smarter
> when emitting vmlinux.h and replace 4-byte longs with ints, knowing
> that vmlinux.h is going to be used from always-64-bit BPF
> architecture. I'll think a bit more about the implications of this,
> and if there are no problems with that approach, I'll submit a change
> soon enough. Pointers will still be a problem, though, and I'm not
> sure how we can solve it yet. Ideally there would be some solution
> that would cause Clang to emit 4-byte reads for pointers. I wonder if
> anyone else has any ideas around that.
>
> As a separate thing, it might be a good idea to allow 4-byte aligned
> 8-byte reads in the verifier on 32-bit architectures, given they are
> allowed on such architectures. Though for you that would lead to
> garbage in the upper 4 bytes of BPF register, which is equally bad.
>
> >
> >
> > Kernel commit
> > --------------------------------------------
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/?h=v5.9-rc3
> >
> >
> > Struct definition from vmlinux.h
> > --------------------------------------------
> > struct trace_event_raw_sys_enter {
> >         struct trace_entry ent;
> >         long int id;
> >         long unsigned int args[6];
> >         char __data[0];
> > };
> >
> >
>
> [...]
>
> > libbpf: collected 0 externs total
> > libbpf: loading kernel BTF '/sys/kernel/btf/vmlinux': 0
> > libbpf: sec 'tracepoint/syscalls/sys_enter_open': found 2 CO-RE relocations
> > libbpf: prog 'tracepoint/syscalls/sys_enter_open': relo #0: kind
> > <byte_off> (0), spec is [2] struct trace_event_raw_sys_ent)
>
> these logs seem to be truncated, btw
>
> > libbpf: CO-RE relocating [2] struct trace_event_raw_sys_enter: found
> > target candidate [4639] struct trace_event_raw_sys_entr
> > libbpf: prog 'tracepoint/syscalls/sys_enter_open': relo #0: matching
> > candidate #0 [4639] struct trace_event_raw_sys_enter.a)
>
> like here it actually should be something like:
>
> struct trace_event_raw_sys_enter.args[0]
>
> > libbpf: prog 'tracepoint/syscalls/sys_enter_open': relo #0: patched
> > insn #0 (LDX/ST/STX) off 24 -> 16
> > libbpf: prog 'tracepoint/syscalls/sys_enter_open': relo #1: kind
> > <byte_off> (0), spec is [2] struct trace_event_raw_sys_ent)
> > libbpf: prog 'tracepoint/syscalls/sys_enter_open': relo #1: matching
> > candidate #0 [4639] struct trace_event_raw_sys_enter.a)
> > libbpf: prog 'tracepoint/syscalls/sys_enter_open': relo #1: patched
> > insn #1 (LDX/ST/STX) off 16 -> 12
> > libbpf: load bpf program failed: Permission denied
> > libbpf: -- BEGIN DUMP LOG ---
> > libbpf:
> > Unrecognized arg#0 type PTR
> > ; int arg2 = (int)ctx->args[1];
> > 0: (79) r6 = *(u64 *)(r1 +16)
> > ; const char *arg1 = (const char *)ctx->args[0];
> > 1: (79) r3 = *(u64 *)(r1 +12)
> > invalid bpf_context access off=12 size=8
> > processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0
> > peak_states 0 mark_read 0
> >
> > libbpf: -- END LOG --
> > libbpf: failed to load program 'tracepoint/syscalls/sys_enter_open'
> > libbpf: failed to load object 'hello_bpf'
> > libbpf: failed to load BPF skeleton 'hello_bpf': -4007
> > failed to load BPF object -4007


Thank you very much for your informative reply. I look forward to the changes.

Best wishes,
Luka Oreskovic

  reply	other threads:[~2020-09-15  7:26 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-11 14:56 Problems with pointer offsets on ARM32 Luka Oreskovic
2020-09-11 18:13 ` Andrii Nakryiko
2020-09-14  7:55   ` Luka Oreskovic
2020-09-14 17:49     ` Andrii Nakryiko
2020-09-15  7:26       ` Luka Oreskovic [this message]
2020-09-30 23:09         ` Andrii Nakryiko
2020-10-01 13:40           ` Luka Oreskovic
2020-10-01 18:21             ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+XBgLWa7nWnQNTUdqgBK2E34PH8mUc_wUWR=_iM2Yjr=gxrVw@mail.gmail.com' \
    --to=luka.oreskovic@sartura.hr \
    --cc=andrii.nakryiko@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=juraj.vijtiuk@sartura.hr \
    --cc=luka.perkov@sartura.hr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.