From: YiFei Zhu <zhuyifei@google.com>
To: Yonghong Song <yhs@fb.com>
Cc: bpf <bpf@vger.kernel.org>, Stanislav Fomichev <sdf@google.com>,
Fangrui Song <maskray@google.com>
Subject: Re: BPF CO-RE and array fields in context struct
Date: Tue, 23 Nov 2021 12:14:28 -0800 [thread overview]
Message-ID: <CAA-VZPnSQTHf-PERo-=-Zzw9tcwiduRUm=t_On_xoSDOWXVSjw@mail.gmail.com> (raw)
In-Reply-To: <8ea7650f-e863-2648-cc3e-254ee9f6c0ae@fb.com>
On Tue, Nov 23, 2021 at 12:08 PM Yonghong Song <yhs@fb.com> wrote:
> On 11/23/21 8:15 AM, YiFei Zhu wrote:
> > On Mon, Nov 22, 2021 at 4:24 PM Yonghong Song <yhs@fb.com> wrote:
> >> On 11/22/21 12:44 PM, YiFei Zhu wrote:
> >>> On Mon, Nov 22, 2021 at 8:19 AM YiFei Zhu <zhuyifei@google.com> wrote:
> >>>>
> >>>> Hi
> >>>>
> >>>> I've been investigating the use of BPF CO-RE. I discovered that if I
> >>>> include vmlinux.h and have all structures annotated with
> >>>> __attribute__((preserve_access_index)), including the context struct,
> >>>> then a prog that accesses an array field in the context struct, in
> >>>> some particular way, cannot pass the verifier.
> >>>>
> >>>> A bunch of manual reduction plus creduce gives me this output:
> >>>>
> >>>> struct bpf_sock_ops {
> >>>> int family;
> >>>> int remote_ip6[];
> >>>> } __attribute__((preserve_access_index));
> >>>> __attribute__((section("sockops"))) int b(struct bpf_sock_ops *d) {
> >>>> int a = d->family;
> >>>> int *c = d->remote_ip6;
> >>>> c[2] = a;
> >>>> return 0;
> >>>> }
> >>>>
> >>>> With Debian clang version 11.1.0-4+build1, this compiles to
> >>>>
> >>>> 0000000000000000 <b>:
> >>>> 0: b7 02 00 00 04 00 00 00 r2 = 4
> >>>> 1: bf 13 00 00 00 00 00 00 r3 = r1
> >>>> 2: 0f 23 00 00 00 00 00 00 r3 += r2
> >>>> 3: 61 11 00 00 00 00 00 00 r1 = *(u32 *)(r1 + 0)
> >>>> 4: 63 13 08 00 00 00 00 00 *(u32 *)(r3 + 8) = r1
> >>>> 5: b7 00 00 00 00 00 00 00 r0 = 0
> >>>> 6: 95 00 00 00 00 00 00 00 exit
> >>>>
> >>>> And the prog will be rejected with this verifier log:
> >>>>
> >>>> ; __attribute__((section("sockops"))) int b(struct bpf_sock_ops *d) {
> >>>> 0: (b7) r2 = 32
> >>>> 1: (bf) r3 = r1
> >>>> 2: (0f) r3 += r2
> >>>> last_idx 2 first_idx 0
> >>>> regs=4 stack=0 before 1: (bf) r3 = r1
> >>>> regs=4 stack=0 before 0: (b7) r2 = 32
> >>>> ; int a = d->family;
> >>>> 3: (61) r1 = *(u32 *)(r1 +20)
> >>>> ; c[2] = a;
> >>>> 4: (63) *(u32 *)(r3 +8) = r1
> >>>> dereference of modified ctx ptr R3 off=32 disallowed
> >>>> processed 5 insns (limit 1000000) max_states_per_insn 0 total_states
> >>>> 0 peak_states 0 mark_read 0
> >>>>
> >>>> Looking at check_ctx_reg() and its callsite at check_mem_access() in
> >>>> verifier.c, it seems that the verifier really does not like when the
> >>>> context pointer has an offset, in this case the field offset of
> >>>> d->remote_ip6.
> >>>>
> >>>> I thought this is just an issue with array fields, that field offset
> >>>> relocations may have trouble expressing two field accesses (one struct
> >>>> member, one array memory). However, further testing reveals that this
> >>>> is not the case, because if I simplify out the local variables, the
> >>>> error is gone:
> >>>>
> >>>> struct bpf_sock_ops {
> >>>> int family;
> >>>> int remote_ip6[];
> >>>> } __attribute__((preserve_access_index));
> >>>> __attribute__((section("sockops"))) int b(struct bpf_sock_ops *d) {
> >>>> d->remote_ip6[2] = d->family;
> >>>> return 0;
> >>>> }
> >>>>
> >>>> is compiled to:
> >>>>
> >>>> 0000000000000000 <b>:
> >>>> 0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0)
> >>>> 1: 63 21 0c 00 00 00 00 00 *(u32 *)(r1 + 12) = r2
> >>>> 2: b7 00 00 00 00 00 00 00 r0 = 0
> >>>> 3: 95 00 00 00 00 00 00 00 exit
> >>>>
> >>>> and is loaded as:
> >>>>
> >>>> ; d->remote_ip6[2] = d->family;
> >>>> 0: (61) r2 = *(u32 *)(r1 +20)
> >>>> ; d->remote_ip6[2] = d->family;
> >>>> 1: (63) *(u32 *)(r1 +40) = r2
> >>>> invalid bpf_context access off=40 size=4
> >>>>
> >>>> I believe this error is because d->remote_ip6 is read-only, that this
> >>>> modification might be more of a product of creduce, but we can see
> >>>> that the CO-RE adjected offset of the array element from the context
> >>>> pointer is correct: 32 to remote_ip6, 8 array index, so total offset
> >>>> is 40.
> >>>>
> >>>> Also note that removal of __attribute__((preserve_access_index)) from
> >>>> the first (miscompiled) program produces exactly the same bytecode as
> >>>> this new program (with no locals).
> >>>>
> >>>> What is going on here? Why does the access of an array in context in
> >>>> this particular way cause it to generate code that would not pass the
> >>>> verifier? Is it a bug in Clang/LLVM, or is it the verifier being too
> >>>> strict?
> >>>
> >>> Additionally, testing the latest LLVM main branch, this test case,
> >>> which does not touch array fields at all, fails but succeeded with
> >>> clang version 11.1.0:
> >>>
> >>> struct bpf_sock_ops {
> >>> int op;
> >>> int bpf_sock_ops_cb_flags;
> >>> } __attribute__((preserve_access_index));
> >>> enum { a, b } static (*c)() = (void *)9;
> >>> enum d { e } f;
> >>> enum d g;
> >>> __attribute__((section("sockops"))) int h(struct bpf_sock_ops *i) {
> >>> switch (i->op) {
> >>> case a:
> >>> f = g = c(i, i->bpf_sock_ops_cb_flags);
> >>> break;
> >>> case b:
> >>> f = g = c(i, i->bpf_sock_ops_cb_flags);
> >>> }
> >>> return 0;
> >>> }
> >>
> >> This is another issue which actually appears (even in bpf mailing list)
> >> multiple times.
> >>
> >> The following change should fix the issue:
> >>
> >> $ diff t1.c t1-good.c
> >> --- t1.c 2021-11-22 16:00:13.915921544 -0800
> >> +++ t1-good.c 2021-11-22 16:12:32.823710102 -0800
> >> @@ -5,13 +5,14 @@
> >> enum { a, b } static (*c)() = (void *)9;
> >> enum d { e } f;
> >> enum d g;
> >> + #define __barrier asm volatile("" ::: "memory")
> >> __attribute__((section("sockops"))) int h(struct bpf_sock_ops *i) {
> >> switch (i->op) {
> >> case a:
> >> - f = g = c(i, i->bpf_sock_ops_cb_flags);
> >> + f = g = c(i, i->bpf_sock_ops_cb_flags); __barrier;
> >> break;
> >> case b:
> >> - f = g = c(i, i->bpf_sock_ops_cb_flags);
> >> + f = g = c(i, i->bpf_sock_ops_cb_flags); __barrier;
> >> }
> >> return 0;
> >> }
> >>
> >> Basically add a compiler barrier at the end of case statements
> >> to prevent common code sinking.
> >>
> >> In the above case, for the original code, latest compiler did an
> >> optimization like
> >> case a:
> >> tmp = reloc_offset(i->bpf_sock_ops_cb_flags);
> >> case b:
> >> tmp = reloc_offset(i->bpf_sock_ops_cb_flags);
> >> common:
> >> val = load r1, tmp
> >> ...
> >>
> >> Note that reloc_offset is not sinked to the common code
> >> due to its special internal representation.
> >>
> >> To avoid such a code generation, add compiler barrier to
> >> the end of case statement to prevent load sinking in which case
> >> we will have
> >> val = load r1, reloc_offset(...)
> >> and verifier will be happy about this.
> >>
> >> The commit you listed below had a big change which may enable
> >> the above compiler optimization and llvm11 may just not do
> >> the code sinking optimization in this particular instance.
> >>
> >> I guess the compiler could still enforce this. But since it does
> >> not know whether the memory access is for context or not, doing
> >> so might hurt performance. But any way, this has appeared multiple
> >> times internally and also in the mailing list. We will take a further
> >> look.
> >
> > I see, thanks for the explanations. What is interesting is that prior
> > to that commit reloc_offset(i->bpf_sock_ops_cb_flags) is generated
> > only once. The disassembly matches that of
> > case a:
> > case b:
> > tmp = reloc_offset(i->bpf_sock_ops_cb_flags);
> > val = load r1, tmp
> >
> > Whereas with the compiler barriers, both compilers generate (no common code):
> >
> > 0000000000000000 <h>:
> > 0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0)
> > 1: 15 02 0a 00 01 00 00 00 if r2 == 1 goto +10 <LBB0_3>
> > 2: 55 02 11 00 00 00 00 00 if r2 != 0 goto +17 <LBB0_4>
> > 3: 61 12 04 00 00 00 00 00 r2 = *(u32 *)(r1 + 4)
> > 4: 85 00 00 00 09 00 00 00 call 9
> > 5: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
> > 7: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0
> > 8: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
> > 10: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0
> > 11: 05 00 08 00 00 00 00 00 goto +8 <LBB0_4>
> >
> > 0000000000000060 <LBB0_3>:
> > 12: 61 12 04 00 00 00 00 00 r2 = *(u32 *)(r1 + 4)
> > 13: 85 00 00 00 09 00 00 00 call 9
> > 14: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
> > 16: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0
> > 17: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
> > 19: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0
> >
> > 00000000000000a0 <LBB0_4>:
> > 20: b7 00 00 00 00 00 00 00 r0 = 0
> > 21: 95 00 00 00 00 00 00 00 exit
> >
> > Did the linked commit create the special internal representation so
> > that they cannot be common code sinked, or is there some other issue
> > going on, or am I misunderstanding something?
>
> Yes, the linked commit added a special builtin with additional
> ever-increasing argument to prevent reloc_offset from sinking.
> This is to ensure the relocation related codes won't be separated into
> different basic blocks. But this won't be able to prevent the issue
> you described in the above.
Ah. I see. Thanks for the explanations. Looking forward to see the fixes!
YiFei Zhu
> > Thanks
> > YiFei Zhu
> >>> The bad code generation of latest LLVM:
> >>>
> >>> 0000000000000000 <h>:
> >>> 0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0)
> >>> 1: 15 02 01 00 01 00 00 00 if r2 == 1 goto +1 <LBB0_2>
> >>> 2: 55 02 0b 00 00 00 00 00 if r2 != 0 goto +11 <LBB0_3>
> >>>
> >>> 0000000000000018 <LBB0_2>:
> >>> 3: b7 03 00 00 04 00 00 00 r3 = 4
> >>> 4: bf 12 00 00 00 00 00 00 r2 = r1
> >>> 5: 0f 32 00 00 00 00 00 00 r2 += r3
> >>> 6: 61 22 00 00 00 00 00 00 r2 = *(u32 *)(r2 + 0)
> >>> 7: 85 00 00 00 09 00 00 00 call 9
> >>> 8: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
> >>> 10: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0
> >>> 11: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
> >>> 13: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0
> >>>
> >>> 0000000000000070 <LBB0_3>:
> >>> 14: b7 00 00 00 00 00 00 00 r0 = 0
> >>> 15: 95 00 00 00 00 00 00 00 exit
> >>>
> >>> The good code generation of LLVM 11.1.0:
> >>>
> >>> 0000000000000000 <h>:
> >>> 0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0)
> >>> 1: 25 02 08 00 01 00 00 00 if r2 > 1 goto +8 <LBB0_2>
> >>> 2: 61 12 04 00 00 00 00 00 r2 = *(u32 *)(r1 + 4)
> >>> 3: 85 00 00 00 09 00 00 00 call 9
> >>> 4: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
> >>> 6: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0
> >>> 7: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
> >>> 9: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0
> >>>
> >>> 0000000000000050 <LBB0_2>:
> >>> 10: b7 00 00 00 00 00 00 00 r0 = 0
> >>> 11: 95 00 00 00 00 00 00 00 exit
> >>>
> >>> A bisect points me to this commit in LLVM as the first bad commit:
> >>>
> >>> commit 54d9f743c8b0f501288119123cf1828bf7ade69c
> >>> Author: Yonghong Song <yhs@fb.com>
> >>> Date: Wed Sep 2 22:56:41 2020 -0700
> >>>
> >>> BPF: move AbstractMemberAccess and PreserveDIType passes to
> >>> EP_EarlyAsPossible
> >>>
> >>> Move abstractMemberAccess and PreserveDIType passes as early as
> >>> possible, right after clang code generation.
> >>>
> >>> [...]
> >>>
> >>> Differential Revision: https://reviews.llvm.org/D87153
> >>>
> >>> YiFei Zhu
> >>>
next prev parent reply other threads:[~2021-11-23 20:14 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-22 16:19 BPF CO-RE and array fields in context struct YiFei Zhu
2021-11-22 20:44 ` YiFei Zhu
2021-11-23 0:24 ` Yonghong Song
2021-11-23 16:15 ` YiFei Zhu
2021-11-23 20:08 ` Yonghong Song
2021-11-23 20:14 ` YiFei Zhu [this message]
2021-11-22 23:56 ` Yonghong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAA-VZPnSQTHf-PERo-=-Zzw9tcwiduRUm=t_On_xoSDOWXVSjw@mail.gmail.com' \
--to=zhuyifei@google.com \
--cc=bpf@vger.kernel.org \
--cc=maskray@google.com \
--cc=sdf@google.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).