* BPF CO-RE and array fields in context struct @ 2021-11-22 16:19 YiFei Zhu 2021-11-22 20:44 ` YiFei Zhu 2021-11-22 23:56 ` Yonghong Song 0 siblings, 2 replies; 7+ messages in thread From: YiFei Zhu @ 2021-11-22 16:19 UTC (permalink / raw) To: bpf, Yonghong Song; +Cc: Stanislav Fomichev, Fangrui Song Hi I've been investigating the use of BPF CO-RE. I discovered that if I include vmlinux.h and have all structures annotated with __attribute__((preserve_access_index)), including the context struct, then a prog that accesses an array field in the context struct, in some particular way, cannot pass the verifier. A bunch of manual reduction plus creduce gives me this output: struct bpf_sock_ops { int family; int remote_ip6[]; } __attribute__((preserve_access_index)); __attribute__((section("sockops"))) int b(struct bpf_sock_ops *d) { int a = d->family; int *c = d->remote_ip6; c[2] = a; return 0; } With Debian clang version 11.1.0-4+build1, this compiles to 0000000000000000 <b>: 0: b7 02 00 00 04 00 00 00 r2 = 4 1: bf 13 00 00 00 00 00 00 r3 = r1 2: 0f 23 00 00 00 00 00 00 r3 += r2 3: 61 11 00 00 00 00 00 00 r1 = *(u32 *)(r1 + 0) 4: 63 13 08 00 00 00 00 00 *(u32 *)(r3 + 8) = r1 5: b7 00 00 00 00 00 00 00 r0 = 0 6: 95 00 00 00 00 00 00 00 exit And the prog will be rejected with this verifier log: ; __attribute__((section("sockops"))) int b(struct bpf_sock_ops *d) { 0: (b7) r2 = 32 1: (bf) r3 = r1 2: (0f) r3 += r2 last_idx 2 first_idx 0 regs=4 stack=0 before 1: (bf) r3 = r1 regs=4 stack=0 before 0: (b7) r2 = 32 ; int a = d->family; 3: (61) r1 = *(u32 *)(r1 +20) ; c[2] = a; 4: (63) *(u32 *)(r3 +8) = r1 dereference of modified ctx ptr R3 off=32 disallowed processed 5 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 Looking at check_ctx_reg() and its callsite at check_mem_access() in verifier.c, it seems that the verifier really does not like when the context pointer has an offset, in this case the field offset of d->remote_ip6. I thought this is just an issue with array fields, that field offset relocations may have trouble expressing two field accesses (one struct member, one array memory). However, further testing reveals that this is not the case, because if I simplify out the local variables, the error is gone: struct bpf_sock_ops { int family; int remote_ip6[]; } __attribute__((preserve_access_index)); __attribute__((section("sockops"))) int b(struct bpf_sock_ops *d) { d->remote_ip6[2] = d->family; return 0; } is compiled to: 0000000000000000 <b>: 0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0) 1: 63 21 0c 00 00 00 00 00 *(u32 *)(r1 + 12) = r2 2: b7 00 00 00 00 00 00 00 r0 = 0 3: 95 00 00 00 00 00 00 00 exit and is loaded as: ; d->remote_ip6[2] = d->family; 0: (61) r2 = *(u32 *)(r1 +20) ; d->remote_ip6[2] = d->family; 1: (63) *(u32 *)(r1 +40) = r2 invalid bpf_context access off=40 size=4 I believe this error is because d->remote_ip6 is read-only, that this modification might be more of a product of creduce, but we can see that the CO-RE adjected offset of the array element from the context pointer is correct: 32 to remote_ip6, 8 array index, so total offset is 40. Also note that removal of __attribute__((preserve_access_index)) from the first (miscompiled) program produces exactly the same bytecode as this new program (with no locals). What is going on here? Why does the access of an array in context in this particular way cause it to generate code that would not pass the verifier? Is it a bug in Clang/LLVM, or is it the verifier being too strict? Thanks YiFei Zhu ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BPF CO-RE and array fields in context struct 2021-11-22 16:19 BPF CO-RE and array fields in context struct YiFei Zhu @ 2021-11-22 20:44 ` YiFei Zhu 2021-11-23 0:24 ` Yonghong Song 2021-11-22 23:56 ` Yonghong Song 1 sibling, 1 reply; 7+ messages in thread From: YiFei Zhu @ 2021-11-22 20:44 UTC (permalink / raw) To: bpf, Yonghong Song; +Cc: Stanislav Fomichev, Fangrui Song On Mon, Nov 22, 2021 at 8:19 AM YiFei Zhu <zhuyifei@google.com> wrote: > > Hi > > I've been investigating the use of BPF CO-RE. I discovered that if I > include vmlinux.h and have all structures annotated with > __attribute__((preserve_access_index)), including the context struct, > then a prog that accesses an array field in the context struct, in > some particular way, cannot pass the verifier. > > A bunch of manual reduction plus creduce gives me this output: > > struct bpf_sock_ops { > int family; > int remote_ip6[]; > } __attribute__((preserve_access_index)); > __attribute__((section("sockops"))) int b(struct bpf_sock_ops *d) { > int a = d->family; > int *c = d->remote_ip6; > c[2] = a; > return 0; > } > > With Debian clang version 11.1.0-4+build1, this compiles to > > 0000000000000000 <b>: > 0: b7 02 00 00 04 00 00 00 r2 = 4 > 1: bf 13 00 00 00 00 00 00 r3 = r1 > 2: 0f 23 00 00 00 00 00 00 r3 += r2 > 3: 61 11 00 00 00 00 00 00 r1 = *(u32 *)(r1 + 0) > 4: 63 13 08 00 00 00 00 00 *(u32 *)(r3 + 8) = r1 > 5: b7 00 00 00 00 00 00 00 r0 = 0 > 6: 95 00 00 00 00 00 00 00 exit > > And the prog will be rejected with this verifier log: > > ; __attribute__((section("sockops"))) int b(struct bpf_sock_ops *d) { > 0: (b7) r2 = 32 > 1: (bf) r3 = r1 > 2: (0f) r3 += r2 > last_idx 2 first_idx 0 > regs=4 stack=0 before 1: (bf) r3 = r1 > regs=4 stack=0 before 0: (b7) r2 = 32 > ; int a = d->family; > 3: (61) r1 = *(u32 *)(r1 +20) > ; c[2] = a; > 4: (63) *(u32 *)(r3 +8) = r1 > dereference of modified ctx ptr R3 off=32 disallowed > processed 5 insns (limit 1000000) max_states_per_insn 0 total_states > 0 peak_states 0 mark_read 0 > > Looking at check_ctx_reg() and its callsite at check_mem_access() in > verifier.c, it seems that the verifier really does not like when the > context pointer has an offset, in this case the field offset of > d->remote_ip6. > > I thought this is just an issue with array fields, that field offset > relocations may have trouble expressing two field accesses (one struct > member, one array memory). However, further testing reveals that this > is not the case, because if I simplify out the local variables, the > error is gone: > > struct bpf_sock_ops { > int family; > int remote_ip6[]; > } __attribute__((preserve_access_index)); > __attribute__((section("sockops"))) int b(struct bpf_sock_ops *d) { > d->remote_ip6[2] = d->family; > return 0; > } > > is compiled to: > > 0000000000000000 <b>: > 0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0) > 1: 63 21 0c 00 00 00 00 00 *(u32 *)(r1 + 12) = r2 > 2: b7 00 00 00 00 00 00 00 r0 = 0 > 3: 95 00 00 00 00 00 00 00 exit > > and is loaded as: > > ; d->remote_ip6[2] = d->family; > 0: (61) r2 = *(u32 *)(r1 +20) > ; d->remote_ip6[2] = d->family; > 1: (63) *(u32 *)(r1 +40) = r2 > invalid bpf_context access off=40 size=4 > > I believe this error is because d->remote_ip6 is read-only, that this > modification might be more of a product of creduce, but we can see > that the CO-RE adjected offset of the array element from the context > pointer is correct: 32 to remote_ip6, 8 array index, so total offset > is 40. > > Also note that removal of __attribute__((preserve_access_index)) from > the first (miscompiled) program produces exactly the same bytecode as > this new program (with no locals). > > What is going on here? Why does the access of an array in context in > this particular way cause it to generate code that would not pass the > verifier? Is it a bug in Clang/LLVM, or is it the verifier being too > strict? Additionally, testing the latest LLVM main branch, this test case, which does not touch array fields at all, fails but succeeded with clang version 11.1.0: struct bpf_sock_ops { int op; int bpf_sock_ops_cb_flags; } __attribute__((preserve_access_index)); enum { a, b } static (*c)() = (void *)9; enum d { e } f; enum d g; __attribute__((section("sockops"))) int h(struct bpf_sock_ops *i) { switch (i->op) { case a: f = g = c(i, i->bpf_sock_ops_cb_flags); break; case b: f = g = c(i, i->bpf_sock_ops_cb_flags); } return 0; } The bad code generation of latest LLVM: 0000000000000000 <h>: 0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0) 1: 15 02 01 00 01 00 00 00 if r2 == 1 goto +1 <LBB0_2> 2: 55 02 0b 00 00 00 00 00 if r2 != 0 goto +11 <LBB0_3> 0000000000000018 <LBB0_2>: 3: b7 03 00 00 04 00 00 00 r3 = 4 4: bf 12 00 00 00 00 00 00 r2 = r1 5: 0f 32 00 00 00 00 00 00 r2 += r3 6: 61 22 00 00 00 00 00 00 r2 = *(u32 *)(r2 + 0) 7: 85 00 00 00 09 00 00 00 call 9 8: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 10: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 11: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 13: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 0000000000000070 <LBB0_3>: 14: b7 00 00 00 00 00 00 00 r0 = 0 15: 95 00 00 00 00 00 00 00 exit The good code generation of LLVM 11.1.0: 0000000000000000 <h>: 0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0) 1: 25 02 08 00 01 00 00 00 if r2 > 1 goto +8 <LBB0_2> 2: 61 12 04 00 00 00 00 00 r2 = *(u32 *)(r1 + 4) 3: 85 00 00 00 09 00 00 00 call 9 4: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 6: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 7: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 9: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 0000000000000050 <LBB0_2>: 10: b7 00 00 00 00 00 00 00 r0 = 0 11: 95 00 00 00 00 00 00 00 exit A bisect points me to this commit in LLVM as the first bad commit: commit 54d9f743c8b0f501288119123cf1828bf7ade69c Author: Yonghong Song <yhs@fb.com> Date: Wed Sep 2 22:56:41 2020 -0700 BPF: move AbstractMemberAccess and PreserveDIType passes to EP_EarlyAsPossible Move abstractMemberAccess and PreserveDIType passes as early as possible, right after clang code generation. [...] Differential Revision: https://reviews.llvm.org/D87153 YiFei Zhu ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BPF CO-RE and array fields in context struct 2021-11-22 20:44 ` YiFei Zhu @ 2021-11-23 0:24 ` Yonghong Song 2021-11-23 16:15 ` YiFei Zhu 0 siblings, 1 reply; 7+ messages in thread From: Yonghong Song @ 2021-11-23 0:24 UTC (permalink / raw) To: YiFei Zhu, bpf; +Cc: Stanislav Fomichev, Fangrui Song On 11/22/21 12:44 PM, YiFei Zhu wrote: > On Mon, Nov 22, 2021 at 8:19 AM YiFei Zhu <zhuyifei@google.com> wrote: >> >> Hi >> >> I've been investigating the use of BPF CO-RE. I discovered that if I >> include vmlinux.h and have all structures annotated with >> __attribute__((preserve_access_index)), including the context struct, >> then a prog that accesses an array field in the context struct, in >> some particular way, cannot pass the verifier. >> >> A bunch of manual reduction plus creduce gives me this output: >> >> struct bpf_sock_ops { >> int family; >> int remote_ip6[]; >> } __attribute__((preserve_access_index)); >> __attribute__((section("sockops"))) int b(struct bpf_sock_ops *d) { >> int a = d->family; >> int *c = d->remote_ip6; >> c[2] = a; >> return 0; >> } >> >> With Debian clang version 11.1.0-4+build1, this compiles to >> >> 0000000000000000 <b>: >> 0: b7 02 00 00 04 00 00 00 r2 = 4 >> 1: bf 13 00 00 00 00 00 00 r3 = r1 >> 2: 0f 23 00 00 00 00 00 00 r3 += r2 >> 3: 61 11 00 00 00 00 00 00 r1 = *(u32 *)(r1 + 0) >> 4: 63 13 08 00 00 00 00 00 *(u32 *)(r3 + 8) = r1 >> 5: b7 00 00 00 00 00 00 00 r0 = 0 >> 6: 95 00 00 00 00 00 00 00 exit >> >> And the prog will be rejected with this verifier log: >> >> ; __attribute__((section("sockops"))) int b(struct bpf_sock_ops *d) { >> 0: (b7) r2 = 32 >> 1: (bf) r3 = r1 >> 2: (0f) r3 += r2 >> last_idx 2 first_idx 0 >> regs=4 stack=0 before 1: (bf) r3 = r1 >> regs=4 stack=0 before 0: (b7) r2 = 32 >> ; int a = d->family; >> 3: (61) r1 = *(u32 *)(r1 +20) >> ; c[2] = a; >> 4: (63) *(u32 *)(r3 +8) = r1 >> dereference of modified ctx ptr R3 off=32 disallowed >> processed 5 insns (limit 1000000) max_states_per_insn 0 total_states >> 0 peak_states 0 mark_read 0 >> >> Looking at check_ctx_reg() and its callsite at check_mem_access() in >> verifier.c, it seems that the verifier really does not like when the >> context pointer has an offset, in this case the field offset of >> d->remote_ip6. >> >> I thought this is just an issue with array fields, that field offset >> relocations may have trouble expressing two field accesses (one struct >> member, one array memory). However, further testing reveals that this >> is not the case, because if I simplify out the local variables, the >> error is gone: >> >> struct bpf_sock_ops { >> int family; >> int remote_ip6[]; >> } __attribute__((preserve_access_index)); >> __attribute__((section("sockops"))) int b(struct bpf_sock_ops *d) { >> d->remote_ip6[2] = d->family; >> return 0; >> } >> >> is compiled to: >> >> 0000000000000000 <b>: >> 0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0) >> 1: 63 21 0c 00 00 00 00 00 *(u32 *)(r1 + 12) = r2 >> 2: b7 00 00 00 00 00 00 00 r0 = 0 >> 3: 95 00 00 00 00 00 00 00 exit >> >> and is loaded as: >> >> ; d->remote_ip6[2] = d->family; >> 0: (61) r2 = *(u32 *)(r1 +20) >> ; d->remote_ip6[2] = d->family; >> 1: (63) *(u32 *)(r1 +40) = r2 >> invalid bpf_context access off=40 size=4 >> >> I believe this error is because d->remote_ip6 is read-only, that this >> modification might be more of a product of creduce, but we can see >> that the CO-RE adjected offset of the array element from the context >> pointer is correct: 32 to remote_ip6, 8 array index, so total offset >> is 40. >> >> Also note that removal of __attribute__((preserve_access_index)) from >> the first (miscompiled) program produces exactly the same bytecode as >> this new program (with no locals). >> >> What is going on here? Why does the access of an array in context in >> this particular way cause it to generate code that would not pass the >> verifier? Is it a bug in Clang/LLVM, or is it the verifier being too >> strict? > > Additionally, testing the latest LLVM main branch, this test case, > which does not touch array fields at all, fails but succeeded with > clang version 11.1.0: > > struct bpf_sock_ops { > int op; > int bpf_sock_ops_cb_flags; > } __attribute__((preserve_access_index)); > enum { a, b } static (*c)() = (void *)9; > enum d { e } f; > enum d g; > __attribute__((section("sockops"))) int h(struct bpf_sock_ops *i) { > switch (i->op) { > case a: > f = g = c(i, i->bpf_sock_ops_cb_flags); > break; > case b: > f = g = c(i, i->bpf_sock_ops_cb_flags); > } > return 0; > } This is another issue which actually appears (even in bpf mailing list) multiple times. The following change should fix the issue: $ diff t1.c t1-good.c --- t1.c 2021-11-22 16:00:13.915921544 -0800 +++ t1-good.c 2021-11-22 16:12:32.823710102 -0800 @@ -5,13 +5,14 @@ enum { a, b } static (*c)() = (void *)9; enum d { e } f; enum d g; + #define __barrier asm volatile("" ::: "memory") __attribute__((section("sockops"))) int h(struct bpf_sock_ops *i) { switch (i->op) { case a: - f = g = c(i, i->bpf_sock_ops_cb_flags); + f = g = c(i, i->bpf_sock_ops_cb_flags); __barrier; break; case b: - f = g = c(i, i->bpf_sock_ops_cb_flags); + f = g = c(i, i->bpf_sock_ops_cb_flags); __barrier; } return 0; } Basically add a compiler barrier at the end of case statements to prevent common code sinking. In the above case, for the original code, latest compiler did an optimization like case a: tmp = reloc_offset(i->bpf_sock_ops_cb_flags); case b: tmp = reloc_offset(i->bpf_sock_ops_cb_flags); common: val = load r1, tmp ... Note that reloc_offset is not sinked to the common code due to its special internal representation. To avoid such a code generation, add compiler barrier to the end of case statement to prevent load sinking in which case we will have val = load r1, reloc_offset(...) and verifier will be happy about this. The commit you listed below had a big change which may enable the above compiler optimization and llvm11 may just not do the code sinking optimization in this particular instance. I guess the compiler could still enforce this. But since it does not know whether the memory access is for context or not, doing so might hurt performance. But any way, this has appeared multiple times internally and also in the mailing list. We will take a further look. > > The bad code generation of latest LLVM: > > 0000000000000000 <h>: > 0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0) > 1: 15 02 01 00 01 00 00 00 if r2 == 1 goto +1 <LBB0_2> > 2: 55 02 0b 00 00 00 00 00 if r2 != 0 goto +11 <LBB0_3> > > 0000000000000018 <LBB0_2>: > 3: b7 03 00 00 04 00 00 00 r3 = 4 > 4: bf 12 00 00 00 00 00 00 r2 = r1 > 5: 0f 32 00 00 00 00 00 00 r2 += r3 > 6: 61 22 00 00 00 00 00 00 r2 = *(u32 *)(r2 + 0) > 7: 85 00 00 00 09 00 00 00 call 9 > 8: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll > 10: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 > 11: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll > 13: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 > > 0000000000000070 <LBB0_3>: > 14: b7 00 00 00 00 00 00 00 r0 = 0 > 15: 95 00 00 00 00 00 00 00 exit > > The good code generation of LLVM 11.1.0: > > 0000000000000000 <h>: > 0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0) > 1: 25 02 08 00 01 00 00 00 if r2 > 1 goto +8 <LBB0_2> > 2: 61 12 04 00 00 00 00 00 r2 = *(u32 *)(r1 + 4) > 3: 85 00 00 00 09 00 00 00 call 9 > 4: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll > 6: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 > 7: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll > 9: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 > > 0000000000000050 <LBB0_2>: > 10: b7 00 00 00 00 00 00 00 r0 = 0 > 11: 95 00 00 00 00 00 00 00 exit > > A bisect points me to this commit in LLVM as the first bad commit: > > commit 54d9f743c8b0f501288119123cf1828bf7ade69c > Author: Yonghong Song <yhs@fb.com> > Date: Wed Sep 2 22:56:41 2020 -0700 > > BPF: move AbstractMemberAccess and PreserveDIType passes to > EP_EarlyAsPossible > > Move abstractMemberAccess and PreserveDIType passes as early as > possible, right after clang code generation. > > [...] > > Differential Revision: https://reviews.llvm.org/D87153 > > YiFei Zhu > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BPF CO-RE and array fields in context struct 2021-11-23 0:24 ` Yonghong Song @ 2021-11-23 16:15 ` YiFei Zhu 2021-11-23 20:08 ` Yonghong Song 0 siblings, 1 reply; 7+ messages in thread From: YiFei Zhu @ 2021-11-23 16:15 UTC (permalink / raw) To: Yonghong Song; +Cc: bpf, Stanislav Fomichev, Fangrui Song On Mon, Nov 22, 2021 at 4:24 PM Yonghong Song <yhs@fb.com> wrote: > On 11/22/21 12:44 PM, YiFei Zhu wrote: > > On Mon, Nov 22, 2021 at 8:19 AM YiFei Zhu <zhuyifei@google.com> wrote: > >> > >> Hi > >> > >> I've been investigating the use of BPF CO-RE. I discovered that if I > >> include vmlinux.h and have all structures annotated with > >> __attribute__((preserve_access_index)), including the context struct, > >> then a prog that accesses an array field in the context struct, in > >> some particular way, cannot pass the verifier. > >> > >> A bunch of manual reduction plus creduce gives me this output: > >> > >> struct bpf_sock_ops { > >> int family; > >> int remote_ip6[]; > >> } __attribute__((preserve_access_index)); > >> __attribute__((section("sockops"))) int b(struct bpf_sock_ops *d) { > >> int a = d->family; > >> int *c = d->remote_ip6; > >> c[2] = a; > >> return 0; > >> } > >> > >> With Debian clang version 11.1.0-4+build1, this compiles to > >> > >> 0000000000000000 <b>: > >> 0: b7 02 00 00 04 00 00 00 r2 = 4 > >> 1: bf 13 00 00 00 00 00 00 r3 = r1 > >> 2: 0f 23 00 00 00 00 00 00 r3 += r2 > >> 3: 61 11 00 00 00 00 00 00 r1 = *(u32 *)(r1 + 0) > >> 4: 63 13 08 00 00 00 00 00 *(u32 *)(r3 + 8) = r1 > >> 5: b7 00 00 00 00 00 00 00 r0 = 0 > >> 6: 95 00 00 00 00 00 00 00 exit > >> > >> And the prog will be rejected with this verifier log: > >> > >> ; __attribute__((section("sockops"))) int b(struct bpf_sock_ops *d) { > >> 0: (b7) r2 = 32 > >> 1: (bf) r3 = r1 > >> 2: (0f) r3 += r2 > >> last_idx 2 first_idx 0 > >> regs=4 stack=0 before 1: (bf) r3 = r1 > >> regs=4 stack=0 before 0: (b7) r2 = 32 > >> ; int a = d->family; > >> 3: (61) r1 = *(u32 *)(r1 +20) > >> ; c[2] = a; > >> 4: (63) *(u32 *)(r3 +8) = r1 > >> dereference of modified ctx ptr R3 off=32 disallowed > >> processed 5 insns (limit 1000000) max_states_per_insn 0 total_states > >> 0 peak_states 0 mark_read 0 > >> > >> Looking at check_ctx_reg() and its callsite at check_mem_access() in > >> verifier.c, it seems that the verifier really does not like when the > >> context pointer has an offset, in this case the field offset of > >> d->remote_ip6. > >> > >> I thought this is just an issue with array fields, that field offset > >> relocations may have trouble expressing two field accesses (one struct > >> member, one array memory). However, further testing reveals that this > >> is not the case, because if I simplify out the local variables, the > >> error is gone: > >> > >> struct bpf_sock_ops { > >> int family; > >> int remote_ip6[]; > >> } __attribute__((preserve_access_index)); > >> __attribute__((section("sockops"))) int b(struct bpf_sock_ops *d) { > >> d->remote_ip6[2] = d->family; > >> return 0; > >> } > >> > >> is compiled to: > >> > >> 0000000000000000 <b>: > >> 0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0) > >> 1: 63 21 0c 00 00 00 00 00 *(u32 *)(r1 + 12) = r2 > >> 2: b7 00 00 00 00 00 00 00 r0 = 0 > >> 3: 95 00 00 00 00 00 00 00 exit > >> > >> and is loaded as: > >> > >> ; d->remote_ip6[2] = d->family; > >> 0: (61) r2 = *(u32 *)(r1 +20) > >> ; d->remote_ip6[2] = d->family; > >> 1: (63) *(u32 *)(r1 +40) = r2 > >> invalid bpf_context access off=40 size=4 > >> > >> I believe this error is because d->remote_ip6 is read-only, that this > >> modification might be more of a product of creduce, but we can see > >> that the CO-RE adjected offset of the array element from the context > >> pointer is correct: 32 to remote_ip6, 8 array index, so total offset > >> is 40. > >> > >> Also note that removal of __attribute__((preserve_access_index)) from > >> the first (miscompiled) program produces exactly the same bytecode as > >> this new program (with no locals). > >> > >> What is going on here? Why does the access of an array in context in > >> this particular way cause it to generate code that would not pass the > >> verifier? Is it a bug in Clang/LLVM, or is it the verifier being too > >> strict? > > > > Additionally, testing the latest LLVM main branch, this test case, > > which does not touch array fields at all, fails but succeeded with > > clang version 11.1.0: > > > > struct bpf_sock_ops { > > int op; > > int bpf_sock_ops_cb_flags; > > } __attribute__((preserve_access_index)); > > enum { a, b } static (*c)() = (void *)9; > > enum d { e } f; > > enum d g; > > __attribute__((section("sockops"))) int h(struct bpf_sock_ops *i) { > > switch (i->op) { > > case a: > > f = g = c(i, i->bpf_sock_ops_cb_flags); > > break; > > case b: > > f = g = c(i, i->bpf_sock_ops_cb_flags); > > } > > return 0; > > } > > This is another issue which actually appears (even in bpf mailing list) > multiple times. > > The following change should fix the issue: > > $ diff t1.c t1-good.c > --- t1.c 2021-11-22 16:00:13.915921544 -0800 > +++ t1-good.c 2021-11-22 16:12:32.823710102 -0800 > @@ -5,13 +5,14 @@ > enum { a, b } static (*c)() = (void *)9; > enum d { e } f; > enum d g; > + #define __barrier asm volatile("" ::: "memory") > __attribute__((section("sockops"))) int h(struct bpf_sock_ops *i) { > switch (i->op) { > case a: > - f = g = c(i, i->bpf_sock_ops_cb_flags); > + f = g = c(i, i->bpf_sock_ops_cb_flags); __barrier; > break; > case b: > - f = g = c(i, i->bpf_sock_ops_cb_flags); > + f = g = c(i, i->bpf_sock_ops_cb_flags); __barrier; > } > return 0; > } > > Basically add a compiler barrier at the end of case statements > to prevent common code sinking. > > In the above case, for the original code, latest compiler did an > optimization like > case a: > tmp = reloc_offset(i->bpf_sock_ops_cb_flags); > case b: > tmp = reloc_offset(i->bpf_sock_ops_cb_flags); > common: > val = load r1, tmp > ... > > Note that reloc_offset is not sinked to the common code > due to its special internal representation. > > To avoid such a code generation, add compiler barrier to > the end of case statement to prevent load sinking in which case > we will have > val = load r1, reloc_offset(...) > and verifier will be happy about this. > > The commit you listed below had a big change which may enable > the above compiler optimization and llvm11 may just not do > the code sinking optimization in this particular instance. > > I guess the compiler could still enforce this. But since it does > not know whether the memory access is for context or not, doing > so might hurt performance. But any way, this has appeared multiple > times internally and also in the mailing list. We will take a further > look. I see, thanks for the explanations. What is interesting is that prior to that commit reloc_offset(i->bpf_sock_ops_cb_flags) is generated only once. The disassembly matches that of case a: case b: tmp = reloc_offset(i->bpf_sock_ops_cb_flags); val = load r1, tmp Whereas with the compiler barriers, both compilers generate (no common code): 0000000000000000 <h>: 0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0) 1: 15 02 0a 00 01 00 00 00 if r2 == 1 goto +10 <LBB0_3> 2: 55 02 11 00 00 00 00 00 if r2 != 0 goto +17 <LBB0_4> 3: 61 12 04 00 00 00 00 00 r2 = *(u32 *)(r1 + 4) 4: 85 00 00 00 09 00 00 00 call 9 5: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 7: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 8: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 10: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 11: 05 00 08 00 00 00 00 00 goto +8 <LBB0_4> 0000000000000060 <LBB0_3>: 12: 61 12 04 00 00 00 00 00 r2 = *(u32 *)(r1 + 4) 13: 85 00 00 00 09 00 00 00 call 9 14: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 16: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 17: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 19: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 00000000000000a0 <LBB0_4>: 20: b7 00 00 00 00 00 00 00 r0 = 0 21: 95 00 00 00 00 00 00 00 exit Did the linked commit create the special internal representation so that they cannot be common code sinked, or is there some other issue going on, or am I misunderstanding something? Thanks YiFei Zhu > > The bad code generation of latest LLVM: > > > > 0000000000000000 <h>: > > 0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0) > > 1: 15 02 01 00 01 00 00 00 if r2 == 1 goto +1 <LBB0_2> > > 2: 55 02 0b 00 00 00 00 00 if r2 != 0 goto +11 <LBB0_3> > > > > 0000000000000018 <LBB0_2>: > > 3: b7 03 00 00 04 00 00 00 r3 = 4 > > 4: bf 12 00 00 00 00 00 00 r2 = r1 > > 5: 0f 32 00 00 00 00 00 00 r2 += r3 > > 6: 61 22 00 00 00 00 00 00 r2 = *(u32 *)(r2 + 0) > > 7: 85 00 00 00 09 00 00 00 call 9 > > 8: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll > > 10: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 > > 11: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll > > 13: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 > > > > 0000000000000070 <LBB0_3>: > > 14: b7 00 00 00 00 00 00 00 r0 = 0 > > 15: 95 00 00 00 00 00 00 00 exit > > > > The good code generation of LLVM 11.1.0: > > > > 0000000000000000 <h>: > > 0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0) > > 1: 25 02 08 00 01 00 00 00 if r2 > 1 goto +8 <LBB0_2> > > 2: 61 12 04 00 00 00 00 00 r2 = *(u32 *)(r1 + 4) > > 3: 85 00 00 00 09 00 00 00 call 9 > > 4: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll > > 6: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 > > 7: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll > > 9: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 > > > > 0000000000000050 <LBB0_2>: > > 10: b7 00 00 00 00 00 00 00 r0 = 0 > > 11: 95 00 00 00 00 00 00 00 exit > > > > A bisect points me to this commit in LLVM as the first bad commit: > > > > commit 54d9f743c8b0f501288119123cf1828bf7ade69c > > Author: Yonghong Song <yhs@fb.com> > > Date: Wed Sep 2 22:56:41 2020 -0700 > > > > BPF: move AbstractMemberAccess and PreserveDIType passes to > > EP_EarlyAsPossible > > > > Move abstractMemberAccess and PreserveDIType passes as early as > > possible, right after clang code generation. > > > > [...] > > > > Differential Revision: https://reviews.llvm.org/D87153 > > > > YiFei Zhu > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BPF CO-RE and array fields in context struct 2021-11-23 16:15 ` YiFei Zhu @ 2021-11-23 20:08 ` Yonghong Song 2021-11-23 20:14 ` YiFei Zhu 0 siblings, 1 reply; 7+ messages in thread From: Yonghong Song @ 2021-11-23 20:08 UTC (permalink / raw) To: YiFei Zhu; +Cc: bpf, Stanislav Fomichev, Fangrui Song On 11/23/21 8:15 AM, YiFei Zhu wrote: > On Mon, Nov 22, 2021 at 4:24 PM Yonghong Song <yhs@fb.com> wrote: >> On 11/22/21 12:44 PM, YiFei Zhu wrote: >>> On Mon, Nov 22, 2021 at 8:19 AM YiFei Zhu <zhuyifei@google.com> wrote: >>>> >>>> Hi >>>> >>>> I've been investigating the use of BPF CO-RE. I discovered that if I >>>> include vmlinux.h and have all structures annotated with >>>> __attribute__((preserve_access_index)), including the context struct, >>>> then a prog that accesses an array field in the context struct, in >>>> some particular way, cannot pass the verifier. >>>> >>>> A bunch of manual reduction plus creduce gives me this output: >>>> >>>> struct bpf_sock_ops { >>>> int family; >>>> int remote_ip6[]; >>>> } __attribute__((preserve_access_index)); >>>> __attribute__((section("sockops"))) int b(struct bpf_sock_ops *d) { >>>> int a = d->family; >>>> int *c = d->remote_ip6; >>>> c[2] = a; >>>> return 0; >>>> } >>>> >>>> With Debian clang version 11.1.0-4+build1, this compiles to >>>> >>>> 0000000000000000 <b>: >>>> 0: b7 02 00 00 04 00 00 00 r2 = 4 >>>> 1: bf 13 00 00 00 00 00 00 r3 = r1 >>>> 2: 0f 23 00 00 00 00 00 00 r3 += r2 >>>> 3: 61 11 00 00 00 00 00 00 r1 = *(u32 *)(r1 + 0) >>>> 4: 63 13 08 00 00 00 00 00 *(u32 *)(r3 + 8) = r1 >>>> 5: b7 00 00 00 00 00 00 00 r0 = 0 >>>> 6: 95 00 00 00 00 00 00 00 exit >>>> >>>> And the prog will be rejected with this verifier log: >>>> >>>> ; __attribute__((section("sockops"))) int b(struct bpf_sock_ops *d) { >>>> 0: (b7) r2 = 32 >>>> 1: (bf) r3 = r1 >>>> 2: (0f) r3 += r2 >>>> last_idx 2 first_idx 0 >>>> regs=4 stack=0 before 1: (bf) r3 = r1 >>>> regs=4 stack=0 before 0: (b7) r2 = 32 >>>> ; int a = d->family; >>>> 3: (61) r1 = *(u32 *)(r1 +20) >>>> ; c[2] = a; >>>> 4: (63) *(u32 *)(r3 +8) = r1 >>>> dereference of modified ctx ptr R3 off=32 disallowed >>>> processed 5 insns (limit 1000000) max_states_per_insn 0 total_states >>>> 0 peak_states 0 mark_read 0 >>>> >>>> Looking at check_ctx_reg() and its callsite at check_mem_access() in >>>> verifier.c, it seems that the verifier really does not like when the >>>> context pointer has an offset, in this case the field offset of >>>> d->remote_ip6. >>>> >>>> I thought this is just an issue with array fields, that field offset >>>> relocations may have trouble expressing two field accesses (one struct >>>> member, one array memory). However, further testing reveals that this >>>> is not the case, because if I simplify out the local variables, the >>>> error is gone: >>>> >>>> struct bpf_sock_ops { >>>> int family; >>>> int remote_ip6[]; >>>> } __attribute__((preserve_access_index)); >>>> __attribute__((section("sockops"))) int b(struct bpf_sock_ops *d) { >>>> d->remote_ip6[2] = d->family; >>>> return 0; >>>> } >>>> >>>> is compiled to: >>>> >>>> 0000000000000000 <b>: >>>> 0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0) >>>> 1: 63 21 0c 00 00 00 00 00 *(u32 *)(r1 + 12) = r2 >>>> 2: b7 00 00 00 00 00 00 00 r0 = 0 >>>> 3: 95 00 00 00 00 00 00 00 exit >>>> >>>> and is loaded as: >>>> >>>> ; d->remote_ip6[2] = d->family; >>>> 0: (61) r2 = *(u32 *)(r1 +20) >>>> ; d->remote_ip6[2] = d->family; >>>> 1: (63) *(u32 *)(r1 +40) = r2 >>>> invalid bpf_context access off=40 size=4 >>>> >>>> I believe this error is because d->remote_ip6 is read-only, that this >>>> modification might be more of a product of creduce, but we can see >>>> that the CO-RE adjected offset of the array element from the context >>>> pointer is correct: 32 to remote_ip6, 8 array index, so total offset >>>> is 40. >>>> >>>> Also note that removal of __attribute__((preserve_access_index)) from >>>> the first (miscompiled) program produces exactly the same bytecode as >>>> this new program (with no locals). >>>> >>>> What is going on here? Why does the access of an array in context in >>>> this particular way cause it to generate code that would not pass the >>>> verifier? Is it a bug in Clang/LLVM, or is it the verifier being too >>>> strict? >>> >>> Additionally, testing the latest LLVM main branch, this test case, >>> which does not touch array fields at all, fails but succeeded with >>> clang version 11.1.0: >>> >>> struct bpf_sock_ops { >>> int op; >>> int bpf_sock_ops_cb_flags; >>> } __attribute__((preserve_access_index)); >>> enum { a, b } static (*c)() = (void *)9; >>> enum d { e } f; >>> enum d g; >>> __attribute__((section("sockops"))) int h(struct bpf_sock_ops *i) { >>> switch (i->op) { >>> case a: >>> f = g = c(i, i->bpf_sock_ops_cb_flags); >>> break; >>> case b: >>> f = g = c(i, i->bpf_sock_ops_cb_flags); >>> } >>> return 0; >>> } >> >> This is another issue which actually appears (even in bpf mailing list) >> multiple times. >> >> The following change should fix the issue: >> >> $ diff t1.c t1-good.c >> --- t1.c 2021-11-22 16:00:13.915921544 -0800 >> +++ t1-good.c 2021-11-22 16:12:32.823710102 -0800 >> @@ -5,13 +5,14 @@ >> enum { a, b } static (*c)() = (void *)9; >> enum d { e } f; >> enum d g; >> + #define __barrier asm volatile("" ::: "memory") >> __attribute__((section("sockops"))) int h(struct bpf_sock_ops *i) { >> switch (i->op) { >> case a: >> - f = g = c(i, i->bpf_sock_ops_cb_flags); >> + f = g = c(i, i->bpf_sock_ops_cb_flags); __barrier; >> break; >> case b: >> - f = g = c(i, i->bpf_sock_ops_cb_flags); >> + f = g = c(i, i->bpf_sock_ops_cb_flags); __barrier; >> } >> return 0; >> } >> >> Basically add a compiler barrier at the end of case statements >> to prevent common code sinking. >> >> In the above case, for the original code, latest compiler did an >> optimization like >> case a: >> tmp = reloc_offset(i->bpf_sock_ops_cb_flags); >> case b: >> tmp = reloc_offset(i->bpf_sock_ops_cb_flags); >> common: >> val = load r1, tmp >> ... >> >> Note that reloc_offset is not sinked to the common code >> due to its special internal representation. >> >> To avoid such a code generation, add compiler barrier to >> the end of case statement to prevent load sinking in which case >> we will have >> val = load r1, reloc_offset(...) >> and verifier will be happy about this. >> >> The commit you listed below had a big change which may enable >> the above compiler optimization and llvm11 may just not do >> the code sinking optimization in this particular instance. >> >> I guess the compiler could still enforce this. But since it does >> not know whether the memory access is for context or not, doing >> so might hurt performance. But any way, this has appeared multiple >> times internally and also in the mailing list. We will take a further >> look. > > I see, thanks for the explanations. What is interesting is that prior > to that commit reloc_offset(i->bpf_sock_ops_cb_flags) is generated > only once. The disassembly matches that of > case a: > case b: > tmp = reloc_offset(i->bpf_sock_ops_cb_flags); > val = load r1, tmp > > Whereas with the compiler barriers, both compilers generate (no common code): > > 0000000000000000 <h>: > 0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0) > 1: 15 02 0a 00 01 00 00 00 if r2 == 1 goto +10 <LBB0_3> > 2: 55 02 11 00 00 00 00 00 if r2 != 0 goto +17 <LBB0_4> > 3: 61 12 04 00 00 00 00 00 r2 = *(u32 *)(r1 + 4) > 4: 85 00 00 00 09 00 00 00 call 9 > 5: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll > 7: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 > 8: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll > 10: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 > 11: 05 00 08 00 00 00 00 00 goto +8 <LBB0_4> > > 0000000000000060 <LBB0_3>: > 12: 61 12 04 00 00 00 00 00 r2 = *(u32 *)(r1 + 4) > 13: 85 00 00 00 09 00 00 00 call 9 > 14: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll > 16: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 > 17: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll > 19: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 > > 00000000000000a0 <LBB0_4>: > 20: b7 00 00 00 00 00 00 00 r0 = 0 > 21: 95 00 00 00 00 00 00 00 exit > > Did the linked commit create the special internal representation so > that they cannot be common code sinked, or is there some other issue > going on, or am I misunderstanding something? Yes, the linked commit added a special builtin with additional ever-increasing argument to prevent reloc_offset from sinking. This is to ensure the relocation related codes won't be separated into different basic blocks. But this won't be able to prevent the issue you described in the above. > > Thanks > YiFei Zhu >>> The bad code generation of latest LLVM: >>> >>> 0000000000000000 <h>: >>> 0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0) >>> 1: 15 02 01 00 01 00 00 00 if r2 == 1 goto +1 <LBB0_2> >>> 2: 55 02 0b 00 00 00 00 00 if r2 != 0 goto +11 <LBB0_3> >>> >>> 0000000000000018 <LBB0_2>: >>> 3: b7 03 00 00 04 00 00 00 r3 = 4 >>> 4: bf 12 00 00 00 00 00 00 r2 = r1 >>> 5: 0f 32 00 00 00 00 00 00 r2 += r3 >>> 6: 61 22 00 00 00 00 00 00 r2 = *(u32 *)(r2 + 0) >>> 7: 85 00 00 00 09 00 00 00 call 9 >>> 8: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll >>> 10: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 >>> 11: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll >>> 13: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 >>> >>> 0000000000000070 <LBB0_3>: >>> 14: b7 00 00 00 00 00 00 00 r0 = 0 >>> 15: 95 00 00 00 00 00 00 00 exit >>> >>> The good code generation of LLVM 11.1.0: >>> >>> 0000000000000000 <h>: >>> 0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0) >>> 1: 25 02 08 00 01 00 00 00 if r2 > 1 goto +8 <LBB0_2> >>> 2: 61 12 04 00 00 00 00 00 r2 = *(u32 *)(r1 + 4) >>> 3: 85 00 00 00 09 00 00 00 call 9 >>> 4: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll >>> 6: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 >>> 7: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll >>> 9: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 >>> >>> 0000000000000050 <LBB0_2>: >>> 10: b7 00 00 00 00 00 00 00 r0 = 0 >>> 11: 95 00 00 00 00 00 00 00 exit >>> >>> A bisect points me to this commit in LLVM as the first bad commit: >>> >>> commit 54d9f743c8b0f501288119123cf1828bf7ade69c >>> Author: Yonghong Song <yhs@fb.com> >>> Date: Wed Sep 2 22:56:41 2020 -0700 >>> >>> BPF: move AbstractMemberAccess and PreserveDIType passes to >>> EP_EarlyAsPossible >>> >>> Move abstractMemberAccess and PreserveDIType passes as early as >>> possible, right after clang code generation. >>> >>> [...] >>> >>> Differential Revision: https://reviews.llvm.org/D87153 >>> >>> YiFei Zhu >>> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BPF CO-RE and array fields in context struct 2021-11-23 20:08 ` Yonghong Song @ 2021-11-23 20:14 ` YiFei Zhu 0 siblings, 0 replies; 7+ messages in thread From: YiFei Zhu @ 2021-11-23 20:14 UTC (permalink / raw) To: Yonghong Song; +Cc: bpf, Stanislav Fomichev, Fangrui Song On Tue, Nov 23, 2021 at 12:08 PM Yonghong Song <yhs@fb.com> wrote: > On 11/23/21 8:15 AM, YiFei Zhu wrote: > > On Mon, Nov 22, 2021 at 4:24 PM Yonghong Song <yhs@fb.com> wrote: > >> On 11/22/21 12:44 PM, YiFei Zhu wrote: > >>> On Mon, Nov 22, 2021 at 8:19 AM YiFei Zhu <zhuyifei@google.com> wrote: > >>>> > >>>> Hi > >>>> > >>>> I've been investigating the use of BPF CO-RE. I discovered that if I > >>>> include vmlinux.h and have all structures annotated with > >>>> __attribute__((preserve_access_index)), including the context struct, > >>>> then a prog that accesses an array field in the context struct, in > >>>> some particular way, cannot pass the verifier. > >>>> > >>>> A bunch of manual reduction plus creduce gives me this output: > >>>> > >>>> struct bpf_sock_ops { > >>>> int family; > >>>> int remote_ip6[]; > >>>> } __attribute__((preserve_access_index)); > >>>> __attribute__((section("sockops"))) int b(struct bpf_sock_ops *d) { > >>>> int a = d->family; > >>>> int *c = d->remote_ip6; > >>>> c[2] = a; > >>>> return 0; > >>>> } > >>>> > >>>> With Debian clang version 11.1.0-4+build1, this compiles to > >>>> > >>>> 0000000000000000 <b>: > >>>> 0: b7 02 00 00 04 00 00 00 r2 = 4 > >>>> 1: bf 13 00 00 00 00 00 00 r3 = r1 > >>>> 2: 0f 23 00 00 00 00 00 00 r3 += r2 > >>>> 3: 61 11 00 00 00 00 00 00 r1 = *(u32 *)(r1 + 0) > >>>> 4: 63 13 08 00 00 00 00 00 *(u32 *)(r3 + 8) = r1 > >>>> 5: b7 00 00 00 00 00 00 00 r0 = 0 > >>>> 6: 95 00 00 00 00 00 00 00 exit > >>>> > >>>> And the prog will be rejected with this verifier log: > >>>> > >>>> ; __attribute__((section("sockops"))) int b(struct bpf_sock_ops *d) { > >>>> 0: (b7) r2 = 32 > >>>> 1: (bf) r3 = r1 > >>>> 2: (0f) r3 += r2 > >>>> last_idx 2 first_idx 0 > >>>> regs=4 stack=0 before 1: (bf) r3 = r1 > >>>> regs=4 stack=0 before 0: (b7) r2 = 32 > >>>> ; int a = d->family; > >>>> 3: (61) r1 = *(u32 *)(r1 +20) > >>>> ; c[2] = a; > >>>> 4: (63) *(u32 *)(r3 +8) = r1 > >>>> dereference of modified ctx ptr R3 off=32 disallowed > >>>> processed 5 insns (limit 1000000) max_states_per_insn 0 total_states > >>>> 0 peak_states 0 mark_read 0 > >>>> > >>>> Looking at check_ctx_reg() and its callsite at check_mem_access() in > >>>> verifier.c, it seems that the verifier really does not like when the > >>>> context pointer has an offset, in this case the field offset of > >>>> d->remote_ip6. > >>>> > >>>> I thought this is just an issue with array fields, that field offset > >>>> relocations may have trouble expressing two field accesses (one struct > >>>> member, one array memory). However, further testing reveals that this > >>>> is not the case, because if I simplify out the local variables, the > >>>> error is gone: > >>>> > >>>> struct bpf_sock_ops { > >>>> int family; > >>>> int remote_ip6[]; > >>>> } __attribute__((preserve_access_index)); > >>>> __attribute__((section("sockops"))) int b(struct bpf_sock_ops *d) { > >>>> d->remote_ip6[2] = d->family; > >>>> return 0; > >>>> } > >>>> > >>>> is compiled to: > >>>> > >>>> 0000000000000000 <b>: > >>>> 0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0) > >>>> 1: 63 21 0c 00 00 00 00 00 *(u32 *)(r1 + 12) = r2 > >>>> 2: b7 00 00 00 00 00 00 00 r0 = 0 > >>>> 3: 95 00 00 00 00 00 00 00 exit > >>>> > >>>> and is loaded as: > >>>> > >>>> ; d->remote_ip6[2] = d->family; > >>>> 0: (61) r2 = *(u32 *)(r1 +20) > >>>> ; d->remote_ip6[2] = d->family; > >>>> 1: (63) *(u32 *)(r1 +40) = r2 > >>>> invalid bpf_context access off=40 size=4 > >>>> > >>>> I believe this error is because d->remote_ip6 is read-only, that this > >>>> modification might be more of a product of creduce, but we can see > >>>> that the CO-RE adjected offset of the array element from the context > >>>> pointer is correct: 32 to remote_ip6, 8 array index, so total offset > >>>> is 40. > >>>> > >>>> Also note that removal of __attribute__((preserve_access_index)) from > >>>> the first (miscompiled) program produces exactly the same bytecode as > >>>> this new program (with no locals). > >>>> > >>>> What is going on here? Why does the access of an array in context in > >>>> this particular way cause it to generate code that would not pass the > >>>> verifier? Is it a bug in Clang/LLVM, or is it the verifier being too > >>>> strict? > >>> > >>> Additionally, testing the latest LLVM main branch, this test case, > >>> which does not touch array fields at all, fails but succeeded with > >>> clang version 11.1.0: > >>> > >>> struct bpf_sock_ops { > >>> int op; > >>> int bpf_sock_ops_cb_flags; > >>> } __attribute__((preserve_access_index)); > >>> enum { a, b } static (*c)() = (void *)9; > >>> enum d { e } f; > >>> enum d g; > >>> __attribute__((section("sockops"))) int h(struct bpf_sock_ops *i) { > >>> switch (i->op) { > >>> case a: > >>> f = g = c(i, i->bpf_sock_ops_cb_flags); > >>> break; > >>> case b: > >>> f = g = c(i, i->bpf_sock_ops_cb_flags); > >>> } > >>> return 0; > >>> } > >> > >> This is another issue which actually appears (even in bpf mailing list) > >> multiple times. > >> > >> The following change should fix the issue: > >> > >> $ diff t1.c t1-good.c > >> --- t1.c 2021-11-22 16:00:13.915921544 -0800 > >> +++ t1-good.c 2021-11-22 16:12:32.823710102 -0800 > >> @@ -5,13 +5,14 @@ > >> enum { a, b } static (*c)() = (void *)9; > >> enum d { e } f; > >> enum d g; > >> + #define __barrier asm volatile("" ::: "memory") > >> __attribute__((section("sockops"))) int h(struct bpf_sock_ops *i) { > >> switch (i->op) { > >> case a: > >> - f = g = c(i, i->bpf_sock_ops_cb_flags); > >> + f = g = c(i, i->bpf_sock_ops_cb_flags); __barrier; > >> break; > >> case b: > >> - f = g = c(i, i->bpf_sock_ops_cb_flags); > >> + f = g = c(i, i->bpf_sock_ops_cb_flags); __barrier; > >> } > >> return 0; > >> } > >> > >> Basically add a compiler barrier at the end of case statements > >> to prevent common code sinking. > >> > >> In the above case, for the original code, latest compiler did an > >> optimization like > >> case a: > >> tmp = reloc_offset(i->bpf_sock_ops_cb_flags); > >> case b: > >> tmp = reloc_offset(i->bpf_sock_ops_cb_flags); > >> common: > >> val = load r1, tmp > >> ... > >> > >> Note that reloc_offset is not sinked to the common code > >> due to its special internal representation. > >> > >> To avoid such a code generation, add compiler barrier to > >> the end of case statement to prevent load sinking in which case > >> we will have > >> val = load r1, reloc_offset(...) > >> and verifier will be happy about this. > >> > >> The commit you listed below had a big change which may enable > >> the above compiler optimization and llvm11 may just not do > >> the code sinking optimization in this particular instance. > >> > >> I guess the compiler could still enforce this. But since it does > >> not know whether the memory access is for context or not, doing > >> so might hurt performance. But any way, this has appeared multiple > >> times internally and also in the mailing list. We will take a further > >> look. > > > > I see, thanks for the explanations. What is interesting is that prior > > to that commit reloc_offset(i->bpf_sock_ops_cb_flags) is generated > > only once. The disassembly matches that of > > case a: > > case b: > > tmp = reloc_offset(i->bpf_sock_ops_cb_flags); > > val = load r1, tmp > > > > Whereas with the compiler barriers, both compilers generate (no common code): > > > > 0000000000000000 <h>: > > 0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0) > > 1: 15 02 0a 00 01 00 00 00 if r2 == 1 goto +10 <LBB0_3> > > 2: 55 02 11 00 00 00 00 00 if r2 != 0 goto +17 <LBB0_4> > > 3: 61 12 04 00 00 00 00 00 r2 = *(u32 *)(r1 + 4) > > 4: 85 00 00 00 09 00 00 00 call 9 > > 5: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll > > 7: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 > > 8: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll > > 10: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 > > 11: 05 00 08 00 00 00 00 00 goto +8 <LBB0_4> > > > > 0000000000000060 <LBB0_3>: > > 12: 61 12 04 00 00 00 00 00 r2 = *(u32 *)(r1 + 4) > > 13: 85 00 00 00 09 00 00 00 call 9 > > 14: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll > > 16: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 > > 17: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll > > 19: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 > > > > 00000000000000a0 <LBB0_4>: > > 20: b7 00 00 00 00 00 00 00 r0 = 0 > > 21: 95 00 00 00 00 00 00 00 exit > > > > Did the linked commit create the special internal representation so > > that they cannot be common code sinked, or is there some other issue > > going on, or am I misunderstanding something? > > Yes, the linked commit added a special builtin with additional > ever-increasing argument to prevent reloc_offset from sinking. > This is to ensure the relocation related codes won't be separated into > different basic blocks. But this won't be able to prevent the issue > you described in the above. Ah. I see. Thanks for the explanations. Looking forward to see the fixes! YiFei Zhu > > Thanks > > YiFei Zhu > >>> The bad code generation of latest LLVM: > >>> > >>> 0000000000000000 <h>: > >>> 0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0) > >>> 1: 15 02 01 00 01 00 00 00 if r2 == 1 goto +1 <LBB0_2> > >>> 2: 55 02 0b 00 00 00 00 00 if r2 != 0 goto +11 <LBB0_3> > >>> > >>> 0000000000000018 <LBB0_2>: > >>> 3: b7 03 00 00 04 00 00 00 r3 = 4 > >>> 4: bf 12 00 00 00 00 00 00 r2 = r1 > >>> 5: 0f 32 00 00 00 00 00 00 r2 += r3 > >>> 6: 61 22 00 00 00 00 00 00 r2 = *(u32 *)(r2 + 0) > >>> 7: 85 00 00 00 09 00 00 00 call 9 > >>> 8: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll > >>> 10: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 > >>> 11: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll > >>> 13: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 > >>> > >>> 0000000000000070 <LBB0_3>: > >>> 14: b7 00 00 00 00 00 00 00 r0 = 0 > >>> 15: 95 00 00 00 00 00 00 00 exit > >>> > >>> The good code generation of LLVM 11.1.0: > >>> > >>> 0000000000000000 <h>: > >>> 0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0) > >>> 1: 25 02 08 00 01 00 00 00 if r2 > 1 goto +8 <LBB0_2> > >>> 2: 61 12 04 00 00 00 00 00 r2 = *(u32 *)(r1 + 4) > >>> 3: 85 00 00 00 09 00 00 00 call 9 > >>> 4: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll > >>> 6: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 > >>> 7: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll > >>> 9: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 > >>> > >>> 0000000000000050 <LBB0_2>: > >>> 10: b7 00 00 00 00 00 00 00 r0 = 0 > >>> 11: 95 00 00 00 00 00 00 00 exit > >>> > >>> A bisect points me to this commit in LLVM as the first bad commit: > >>> > >>> commit 54d9f743c8b0f501288119123cf1828bf7ade69c > >>> Author: Yonghong Song <yhs@fb.com> > >>> Date: Wed Sep 2 22:56:41 2020 -0700 > >>> > >>> BPF: move AbstractMemberAccess and PreserveDIType passes to > >>> EP_EarlyAsPossible > >>> > >>> Move abstractMemberAccess and PreserveDIType passes as early as > >>> possible, right after clang code generation. > >>> > >>> [...] > >>> > >>> Differential Revision: https://reviews.llvm.org/D87153 > >>> > >>> YiFei Zhu > >>> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BPF CO-RE and array fields in context struct 2021-11-22 16:19 BPF CO-RE and array fields in context struct YiFei Zhu 2021-11-22 20:44 ` YiFei Zhu @ 2021-11-22 23:56 ` Yonghong Song 1 sibling, 0 replies; 7+ messages in thread From: Yonghong Song @ 2021-11-22 23:56 UTC (permalink / raw) To: YiFei Zhu, bpf; +Cc: Stanislav Fomichev, Fangrui Song On 11/22/21 8:19 AM, YiFei Zhu wrote: > Hi > > I've been investigating the use of BPF CO-RE. I discovered that if I > include vmlinux.h and have all structures annotated with > __attribute__((preserve_access_index)), including the context struct, > then a prog that accesses an array field in the context struct, in > some particular way, cannot pass the verifier. > > A bunch of manual reduction plus creduce gives me this output: > > struct bpf_sock_ops { > int family; > int remote_ip6[]; > } __attribute__((preserve_access_index)); > __attribute__((section("sockops"))) int b(struct bpf_sock_ops *d) { > int a = d->family; > int *c = d->remote_ip6; > c[2] = a; > return 0; > } > > With Debian clang version 11.1.0-4+build1, this compiles to > > 0000000000000000 <b>: > 0: b7 02 00 00 04 00 00 00 r2 = 4 > 1: bf 13 00 00 00 00 00 00 r3 = r1 > 2: 0f 23 00 00 00 00 00 00 r3 += r2 > 3: 61 11 00 00 00 00 00 00 r1 = *(u32 *)(r1 + 0) > 4: 63 13 08 00 00 00 00 00 *(u32 *)(r3 + 8) = r1 > 5: b7 00 00 00 00 00 00 00 r0 = 0 > 6: 95 00 00 00 00 00 00 00 exit > > And the prog will be rejected with this verifier log: > > ; __attribute__((section("sockops"))) int b(struct bpf_sock_ops *d) { > 0: (b7) r2 = 32 > 1: (bf) r3 = r1 > 2: (0f) r3 += r2 > last_idx 2 first_idx 0 > regs=4 stack=0 before 1: (bf) r3 = r1 > regs=4 stack=0 before 0: (b7) r2 = 32 > ; int a = d->family; > 3: (61) r1 = *(u32 *)(r1 +20) > ; c[2] = a; > 4: (63) *(u32 *)(r3 +8) = r1 > dereference of modified ctx ptr R3 off=32 disallowed > processed 5 insns (limit 1000000) max_states_per_insn 0 total_states > 0 peak_states 0 mark_read 0 Thanks for reporting the issue. The example you had here exposed an llvm limitation. For the following code: > int *c = d->remote_ip6; > c[2] = a; The relocation will apply to d->remote_ip6. And the below code sequence is for c = d->remote_ip6: > 0: (b7) r2 = 32 > 1: (bf) r3 = r1 > 2: (0f) r3 += r2 And later c[2] store has the issue as you described above. Note that llvm does not generate relocation for array access itself. It needs to be part of access chain like d->remote_ip6[2] to be relocatable. > > Looking at check_ctx_reg() and its callsite at check_mem_access() in > verifier.c, it seems that the verifier really does not like when the > context pointer has an offset, in this case the field offset of > d->remote_ip6. > > I thought this is just an issue with array fields, that field offset > relocations may have trouble expressing two field accesses (one struct > member, one array memory). However, further testing reveals that this > is not the case, because if I simplify out the local variables, the > error is gone: > > struct bpf_sock_ops { > int family; > int remote_ip6[]; > } __attribute__((preserve_access_index)); > __attribute__((section("sockops"))) int b(struct bpf_sock_ops *d) { > d->remote_ip6[2] = d->family; > return 0; > } > > is compiled to: > > 0000000000000000 <b>: > 0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0) > 1: 63 21 0c 00 00 00 00 00 *(u32 *)(r1 + 12) = r2 > 2: b7 00 00 00 00 00 00 00 r0 = 0 > 3: 95 00 00 00 00 00 00 00 exit > > and is loaded as: > > ; d->remote_ip6[2] = d->family; > 0: (61) r2 = *(u32 *)(r1 +20) > ; d->remote_ip6[2] = d->family; > 1: (63) *(u32 *)(r1 +40) = r2 > invalid bpf_context access off=40 size=4 > > I believe this error is because d->remote_ip6 is read-only, that this > modification might be more of a product of creduce, but we can see > that the CO-RE adjected offset of the array element from the context > pointer is correct: 32 to remote_ip6, 8 array index, so total offset > is 40. In this case, the statement is d->remote_ip6[2] = d->family; And the whole "d->remote_ip6[2]" is relocated. So we generate a single instruction for it: *(u32 *)(r1 +40) = ... So the workaround is to have all related field in the statement up to the load/store operation so we have ONE complete relocation. > > Also note that removal of __attribute__((preserve_access_index)) from > the first (miscompiled) program produces exactly the same bytecode as > this new program (with no locals). > > What is going on here? Why does the access of an array in context in > this particular way cause it to generate code that would not pass the > verifier? Is it a bug in Clang/LLVM, or is it the verifier being too > strict? How can we fix this issue? We could generate IR with relocation information for standalone array operation and later llvm can chain them together. I will take a further look later for a fix. > > Thanks > YiFei Zhu > ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-11-23 20:14 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-11-22 16:19 BPF CO-RE and array fields in context struct YiFei Zhu 2021-11-22 20:44 ` YiFei Zhu 2021-11-23 0:24 ` Yonghong Song 2021-11-23 16:15 ` YiFei Zhu 2021-11-23 20:08 ` Yonghong Song 2021-11-23 20:14 ` YiFei Zhu 2021-11-22 23:56 ` Yonghong Song
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).