All of lore.kernel.org
 help / color / mirror / Atom feed
* BPF_PROG_TYPE_CGROUP_SOCK bpf_sock ctx member direct access and BPF_CORE_READ return different values
@ 2021-09-10 15:36 Juraj Vijtiuk
  2021-09-10 17:44 ` Yonghong Song
  0 siblings, 1 reply; 3+ messages in thread
From: Juraj Vijtiuk @ 2021-09-10 15:36 UTC (permalink / raw)
  To: bpf; +Cc: Dario Barać, David Marcinkovic, Luka Perkov

Hello,

while developing some cgroup socket programs, we have noticed that
some BPF_PROG_TYPE_CGROUP_SOCK programs return different values when
the bpf_sock context is accessed, depending on how they are accessed.
One example of the issue would be access to ctx->family in programs
that attach to BPF_CGROUP_INET6_POST_BIND and
BPF_CGROUP_INET4_POST_BIND. A direct ctx->family access returns the
correct value, while BPF_CORE_READ(ctx, family) returns random values.
The BPF C program and an example userspace C loader are attached
below, with an example trace_pipe output.

So far we have looked at the generated BPF byte code with llvm-objdump
and everything looked fine there, the main difference being in the way
the access is done, as expected. The BPF_CORE_READ macro expands into
a bpf_probe_read_kernel() call with arguments wrapped in
__builtin_preserve_access_index. bpf_probe_read_* helper calls are
supported for BPF_PROG_TYPE_CGROUP_SOCKS so that shouldn't be an
issue. Next, we looked at libbpf debug output, where everything looked
ok too. The part of the output with relocations is attached below.

We have tested this with various kernel versions, including 5.10, 5.11
and 5.13 on x86_64 and 5.11 on 32 bit ARM. The issue appeared on all
of those kernels and architectures.

At this point we're not sure what to look at next so any ideas on what
might cause the issues or suggestions on what to test next would be
greatly appreciated.

Regards,
Juraj Vijtiuk

example.bpf.c
----------------------------------------

#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_core_read.h>

SEC("cgroup/post_bind4")
int cgroup_post_bind4_prog(struct bpf_sock *ctx)
{
    u32 family1 = 0;
    u32 family2 = 0;

    family1 = ctx->family;
    family2 = BPF_CORE_READ(ctx, family);
    bpf_printk("family1 = %u, family2 = %u\n", family1, family2);

    return 0;
}

char LICENSE[] SEC("license") = "GPL";

example.c
----------------------------------------
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/resource.h>
#include <argp.h>

#include <bpf/libbpf.h>
#include <bpf/bpf.h>

#include "example.skel.h"

void read_trace_pipe(void)
{
    int trace_fd;

    trace_fd = open("/sys/kernel/debug/tracing/trace_pipe", O_RDONLY, 0);
    if (trace_fd < 0)
        return;

    while (1) {
        static char buf[4096];
        ssize_t sz;

        sz = read(trace_fd, buf, sizeof(buf) - 1);
        if (sz > 0) {
            buf[sz] = 0;
            puts(buf);
        }
    }
}

int libbpf_print_fn(enum libbpf_print_level level,
const char *format, va_list args)
{
    return vfprintf(stderr, format, args);
}

int main(int argc, char **argv) {
    struct example_bpf *obj;
    int err = 0;
    struct rlimit rlim = {
        .rlim_cur = 512UL << 20,
        .rlim_max = 512UL << 20,
    };

    err = setrlimit(RLIMIT_MEMLOCK, &rlim);
    if (err) {
        fprintf(stderr, "failed to change rlimit\n");
        return 1;
    }

    libbpf_set_print(libbpf_print_fn);
    obj = example_bpf__open();
    if (!obj) {
        fprintf(stderr, "failed to open and/or load BPF object\n");
        return 1;
    }

    err = example_bpf__load(obj);
    if (err) {
        fprintf(stderr, "failed to load BPF object %d\n", err);
        goto cleanup;
    }

    const char *cgroup_path = "/sys/fs/cgroup";
    int cgroup_fd = open(cgroup_path, O_DIRECTORY | O_RDONLY);

    struct bpf_program *prog = obj->progs.cgroup_post_bind4_prog;
    obj->links.cgroup_post_bind4_prog =
bpf_program__attach_cgroup(prog, cgroup_fd);
    err = libbpf_get_error(obj->links.cgroup_post_bind4_prog);
    if (err) {
        fprintf(stderr, "failed to attach BPF program %d\n", err);
        goto cleanup;
    }

    read_trace_pipe();

cleanup:
    example_bpf__destroy(obj);
    return err != 0;
}

trace_pipe output
----------------------------------------
Chrome_IOThread-26477   [006] d..2 385580.114654: bpf_trace_printk:
family1 = 2, family2 = 1747691712
<...>-144100  [004] d..2 385594.936690: bpf_trace_printk: family1 = 2,
family2 = 0

libbpf relocation log
----------------------------------------
libbpf: loading kernel BTF '/sys/kernel/btf/vmlinux': 0
libbpf: sec 'cgroup/post_bind4': found 2 CO-RE relocations
libbpf: prog 'cgroup_post_bind4_prog': relo #0: kind <byte_off> (0),
spec is [2] struct bpf_sock.family (0:1 @ offset 4)
libbpf: CO-RE relocating [2] struct bpf_sock: found target candidate
[24518] struct bpf_sock
libbpf: prog 'cgroup_post_bind4_prog': relo #0: matching candidate #0
[24518] struct bpf_sock.family (0:1 @ offset 4)
libbpf: prog 'cgroup_post_bind4_prog': relo #0: patched insn #9
(ALU/ALU64) imm 4 -> 4
libbpf: prog 'cgroup_post_bind4_prog': relo #1: kind <byte_off> (0),
spec is [2] struct bpf_sock.family (0:1 @ offset 4)
libbpf: prog 'cgroup_post_bind4_prog': relo #1: matching candidate #0
[24518] struct bpf_sock.family (0:1 @ offset 4)
libbpf: prog 'cgroup_post_bind4_prog': relo #1: patched insn #12
(LDX/ST/STX) off 4 -> 4

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: BPF_PROG_TYPE_CGROUP_SOCK bpf_sock ctx member direct access and BPF_CORE_READ return different values
  2021-09-10 15:36 BPF_PROG_TYPE_CGROUP_SOCK bpf_sock ctx member direct access and BPF_CORE_READ return different values Juraj Vijtiuk
@ 2021-09-10 17:44 ` Yonghong Song
  2021-09-13 10:02   ` Juraj Vijtiuk
  0 siblings, 1 reply; 3+ messages in thread
From: Yonghong Song @ 2021-09-10 17:44 UTC (permalink / raw)
  To: Juraj Vijtiuk, bpf; +Cc: Dario Barać, David Marcinkovic, Luka Perkov



On 9/10/21 8:36 AM, Juraj Vijtiuk wrote:
> Hello,
> 
> while developing some cgroup socket programs, we have noticed that
> some BPF_PROG_TYPE_CGROUP_SOCK programs return different values when
> the bpf_sock context is accessed, depending on how they are accessed.
> One example of the issue would be access to ctx->family in programs
> that attach to BPF_CGROUP_INET6_POST_BIND and
> BPF_CGROUP_INET4_POST_BIND. A direct ctx->family access returns the
> correct value, while BPF_CORE_READ(ctx, family) returns random values.
> The BPF C program and an example userspace C loader are attached
> below, with an example trace_pipe output.
> 
> So far we have looked at the generated BPF byte code with llvm-objdump
> and everything looked fine there, the main difference being in the way
> the access is done, as expected. The BPF_CORE_READ macro expands into
> a bpf_probe_read_kernel() call with arguments wrapped in
> __builtin_preserve_access_index. bpf_probe_read_* helper calls are
> supported for BPF_PROG_TYPE_CGROUP_SOCKS so that shouldn't be an
> issue. Next, we looked at libbpf debug output, where everything looked
> ok too. The part of the output with relocations is attached below.

This is an incorrect usage of CORE. See below.

> 
> We have tested this with various kernel versions, including 5.10, 5.11
> and 5.13 on x86_64 and 5.11 on 32 bit ARM. The issue appeared on all
> of those kernels and architectures.
> 
> At this point we're not sure what to look at next so any ideas on what
> might cause the issues or suggestions on what to test next would be
> greatly appreciated.
> 
> Regards,
> Juraj Vijtiuk
> 
> example.bpf.c
> ----------------------------------------
> 
> #include "vmlinux.h"
> #include <bpf/bpf_helpers.h>
> #include <bpf/bpf_core_read.h>
> 
> SEC("cgroup/post_bind4")
> int cgroup_post_bind4_prog(struct bpf_sock *ctx)
> {
>      u32 family1 = 0;
>      u32 family2 = 0;
> 
>      family1 = ctx->family;
>      family2 = BPF_CORE_READ(ctx, family);
>      bpf_printk("family1 = %u, family2 = %u\n", family1, family2);

We have assembly code below:
        0:       b7 02 00 00 04 00 00 00 r2 = 4
        1:       bf 13 00 00 00 00 00 00 r3 = r1
        2:       0f 23 00 00 00 00 00 00 r3 += r2
        3:       61 16 04 00 00 00 00 00 r6 = *(u32 *)(r1 + 4)
        4:       bf a1 00 00 00 00 00 00 r1 = r10
        5:       07 01 00 00 e0 ff ff ff r1 += -32
        6:       b4 02 00 00 04 00 00 00 w2 = 4
        7:       85 00 00 00 71 00 00 00 call 113
        8:       61 a4 e0 ff 00 00 00 00 r4 = *(u32 *)(r10 - 32)

Looks like they are the same one insn #3 read context + 4
and another insn #7 also read context + 4.

But for insn #3, the verifier will rewrite it proper kernel field,
         case offsetof(struct bpf_sock, family):
                 *insn++ = BPF_LDX_MEM(
                         BPF_FIELD_SIZEOF(struct sock_common, skc_family),
                         si->dst_reg, si->src_reg,
                         bpf_target_off(struct sock_common,
                                        skc_family,
                                        sizeof_field(struct sock_common,
                                                     skc_family),
                                        target_size));
                 break;

and this is not the same for insn #7.
That is why they have different results. The CORE is used for accessing
kernel data structures, the "ctx" is not really a kernel data structure
(rather a UAPI interface) in most cases.

> 
>      return 0;
> }
> 
> char LICENSE[] SEC("license") = "GPL";
> 
> example.c
> ----------------------------------------
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <unistd.h>
> #include <sys/resource.h>
> #include <argp.h>
> 
> #include <bpf/libbpf.h>
> #include <bpf/bpf.h>
> 
> #include "example.skel.h"
> 
> void read_trace_pipe(void)
> {
>      int trace_fd;
> 
>      trace_fd = open("/sys/kernel/debug/tracing/trace_pipe", O_RDONLY, 0);
>      if (trace_fd < 0)
>          return;
> 
>      while (1) {
>          static char buf[4096];
>          ssize_t sz;
> 
>          sz = read(trace_fd, buf, sizeof(buf) - 1);
>          if (sz > 0) {
>              buf[sz] = 0;
>              puts(buf);
>          }
>      }
> }
> 
> int libbpf_print_fn(enum libbpf_print_level level,
> const char *format, va_list args)
> {
>      return vfprintf(stderr, format, args);
> }
> 
> int main(int argc, char **argv) {
>      struct example_bpf *obj;
>      int err = 0;
>      struct rlimit rlim = {
>          .rlim_cur = 512UL << 20,
>          .rlim_max = 512UL << 20,
>      };
> 
>      err = setrlimit(RLIMIT_MEMLOCK, &rlim);
>      if (err) {
>          fprintf(stderr, "failed to change rlimit\n");
>          return 1;
>      }
> 
>      libbpf_set_print(libbpf_print_fn);
>      obj = example_bpf__open();
>      if (!obj) {
>          fprintf(stderr, "failed to open and/or load BPF object\n");
>          return 1;
>      }
> 
>      err = example_bpf__load(obj);
>      if (err) {
>          fprintf(stderr, "failed to load BPF object %d\n", err);
>          goto cleanup;
>      }
> 
>      const char *cgroup_path = "/sys/fs/cgroup";
>      int cgroup_fd = open(cgroup_path, O_DIRECTORY | O_RDONLY);
> 
>      struct bpf_program *prog = obj->progs.cgroup_post_bind4_prog;
>      obj->links.cgroup_post_bind4_prog =
> bpf_program__attach_cgroup(prog, cgroup_fd);
>      err = libbpf_get_error(obj->links.cgroup_post_bind4_prog);
>      if (err) {
>          fprintf(stderr, "failed to attach BPF program %d\n", err);
>          goto cleanup;
>      }
> 
>      read_trace_pipe();
> 
> cleanup:
>      example_bpf__destroy(obj);
>      return err != 0;
> }
> 
> trace_pipe output
> ----------------------------------------
> Chrome_IOThread-26477   [006] d..2 385580.114654: bpf_trace_printk:
> family1 = 2, family2 = 1747691712
> <...>-144100  [004] d..2 385594.936690: bpf_trace_printk: family1 = 2,
> family2 = 0
> 
> libbpf relocation log
> ----------------------------------------
> libbpf: loading kernel BTF '/sys/kernel/btf/vmlinux': 0
> libbpf: sec 'cgroup/post_bind4': found 2 CO-RE relocations
> libbpf: prog 'cgroup_post_bind4_prog': relo #0: kind <byte_off> (0),
> spec is [2] struct bpf_sock.family (0:1 @ offset 4)
> libbpf: CO-RE relocating [2] struct bpf_sock: found target candidate
> [24518] struct bpf_sock
> libbpf: prog 'cgroup_post_bind4_prog': relo #0: matching candidate #0
> [24518] struct bpf_sock.family (0:1 @ offset 4)
> libbpf: prog 'cgroup_post_bind4_prog': relo #0: patched insn #9
> (ALU/ALU64) imm 4 -> 4
> libbpf: prog 'cgroup_post_bind4_prog': relo #1: kind <byte_off> (0),
> spec is [2] struct bpf_sock.family (0:1 @ offset 4)
> libbpf: prog 'cgroup_post_bind4_prog': relo #1: matching candidate #0
> [24518] struct bpf_sock.family (0:1 @ offset 4)
> libbpf: prog 'cgroup_post_bind4_prog': relo #1: patched insn #12
> (LDX/ST/STX) off 4 -> 4
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: BPF_PROG_TYPE_CGROUP_SOCK bpf_sock ctx member direct access and BPF_CORE_READ return different values
  2021-09-10 17:44 ` Yonghong Song
@ 2021-09-13 10:02   ` Juraj Vijtiuk
  0 siblings, 0 replies; 3+ messages in thread
From: Juraj Vijtiuk @ 2021-09-13 10:02 UTC (permalink / raw)
  To: Yonghong Song; +Cc: bpf, Dario Barać, David Marcinkovic, Luka Perkov

On Fri, Sep 10, 2021 at 7:45 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 9/10/21 8:36 AM, Juraj Vijtiuk wrote:
> > Hello,
> >
> > while developing some cgroup socket programs, we have noticed that
> > some BPF_PROG_TYPE_CGROUP_SOCK programs return different values when
> > the bpf_sock context is accessed, depending on how they are accessed.
> > One example of the issue would be access to ctx->family in programs
> > that attach to BPF_CGROUP_INET6_POST_BIND and
> > BPF_CGROUP_INET4_POST_BIND. A direct ctx->family access returns the
> > correct value, while BPF_CORE_READ(ctx, family) returns random values.
> > The BPF C program and an example userspace C loader are attached
> > below, with an example trace_pipe output.
> >
> > So far we have looked at the generated BPF byte code with llvm-objdump
> > and everything looked fine there, the main difference being in the way
> > the access is done, as expected. The BPF_CORE_READ macro expands into
> > a bpf_probe_read_kernel() call with arguments wrapped in
> > __builtin_preserve_access_index. bpf_probe_read_* helper calls are
> > supported for BPF_PROG_TYPE_CGROUP_SOCKS so that shouldn't be an
> > issue. Next, we looked at libbpf debug output, where everything looked
> > ok too. The part of the output with relocations is attached below.
>
> This is an incorrect usage of CORE. See below.
>
> >
> > We have tested this with various kernel versions, including 5.10, 5.11
> > and 5.13 on x86_64 and 5.11 on 32 bit ARM. The issue appeared on all
> > of those kernels and architectures.
> >
> > At this point we're not sure what to look at next so any ideas on what
> > might cause the issues or suggestions on what to test next would be
> > greatly appreciated.
> >
> > Regards,
> > Juraj Vijtiuk
> >
> > example.bpf.c
> > ----------------------------------------
> >
> > #include "vmlinux.h"
> > #include <bpf/bpf_helpers.h>
> > #include <bpf/bpf_core_read.h>
> >
> > SEC("cgroup/post_bind4")
> > int cgroup_post_bind4_prog(struct bpf_sock *ctx)
> > {
> >      u32 family1 = 0;
> >      u32 family2 = 0;
> >
> >      family1 = ctx->family;
> >      family2 = BPF_CORE_READ(ctx, family);
> >      bpf_printk("family1 = %u, family2 = %u\n", family1, family2);
>
> We have assembly code below:
>         0:       b7 02 00 00 04 00 00 00 r2 = 4
>         1:       bf 13 00 00 00 00 00 00 r3 = r1
>         2:       0f 23 00 00 00 00 00 00 r3 += r2
>         3:       61 16 04 00 00 00 00 00 r6 = *(u32 *)(r1 + 4)
>         4:       bf a1 00 00 00 00 00 00 r1 = r10
>         5:       07 01 00 00 e0 ff ff ff r1 += -32
>         6:       b4 02 00 00 04 00 00 00 w2 = 4
>         7:       85 00 00 00 71 00 00 00 call 113
>         8:       61 a4 e0 ff 00 00 00 00 r4 = *(u32 *)(r10 - 32)
>
> Looks like they are the same one insn #3 read context + 4
> and another insn #7 also read context + 4.
>
> But for insn #3, the verifier will rewrite it proper kernel field,
>          case offsetof(struct bpf_sock, family):
>                  *insn++ = BPF_LDX_MEM(
>                          BPF_FIELD_SIZEOF(struct sock_common, skc_family),
>                          si->dst_reg, si->src_reg,
>                          bpf_target_off(struct sock_common,
>                                         skc_family,
>                                         sizeof_field(struct sock_common,
>                                                      skc_family),
>                                         target_size));
>                  break;
>
> and this is not the same for insn #7.
> That is why they have different results. The CORE is used for accessing
> kernel data structures, the "ctx" is not really a kernel data structure
> (rather a UAPI interface) in most cases.
>

This makes sense and seems obvious now, after reading the explanation above.
Thank you very much for the quick and detailed response.

> >
> >      return 0;
> > }
> >
> > char LICENSE[] SEC("license") = "GPL";
> >
> > example.c
> > ----------------------------------------
> > #include <sys/types.h>
> > #include <sys/stat.h>
> > #include <fcntl.h>
> > #include <unistd.h>
> > #include <sys/resource.h>
> > #include <argp.h>
> >
> > #include <bpf/libbpf.h>
> > #include <bpf/bpf.h>
> >
> > #include "example.skel.h"
> >
> > void read_trace_pipe(void)
> > {
> >      int trace_fd;
> >
> >      trace_fd = open("/sys/kernel/debug/tracing/trace_pipe", O_RDONLY, 0);
> >      if (trace_fd < 0)
> >          return;
> >
> >      while (1) {
> >          static char buf[4096];
> >          ssize_t sz;
> >
> >          sz = read(trace_fd, buf, sizeof(buf) - 1);
> >          if (sz > 0) {
> >              buf[sz] = 0;
> >              puts(buf);
> >          }
> >      }
> > }
> >
> > int libbpf_print_fn(enum libbpf_print_level level,
> > const char *format, va_list args)
> > {
> >      return vfprintf(stderr, format, args);
> > }
> >
> > int main(int argc, char **argv) {
> >      struct example_bpf *obj;
> >      int err = 0;
> >      struct rlimit rlim = {
> >          .rlim_cur = 512UL << 20,
> >          .rlim_max = 512UL << 20,
> >      };
> >
> >      err = setrlimit(RLIMIT_MEMLOCK, &rlim);
> >      if (err) {
> >          fprintf(stderr, "failed to change rlimit\n");
> >          return 1;
> >      }
> >
> >      libbpf_set_print(libbpf_print_fn);
> >      obj = example_bpf__open();
> >      if (!obj) {
> >          fprintf(stderr, "failed to open and/or load BPF object\n");
> >          return 1;
> >      }
> >
> >      err = example_bpf__load(obj);
> >      if (err) {
> >          fprintf(stderr, "failed to load BPF object %d\n", err);
> >          goto cleanup;
> >      }
> >
> >      const char *cgroup_path = "/sys/fs/cgroup";
> >      int cgroup_fd = open(cgroup_path, O_DIRECTORY | O_RDONLY);
> >
> >      struct bpf_program *prog = obj->progs.cgroup_post_bind4_prog;
> >      obj->links.cgroup_post_bind4_prog =
> > bpf_program__attach_cgroup(prog, cgroup_fd);
> >      err = libbpf_get_error(obj->links.cgroup_post_bind4_prog);
> >      if (err) {
> >          fprintf(stderr, "failed to attach BPF program %d\n", err);
> >          goto cleanup;
> >      }
> >
> >      read_trace_pipe();
> >
> > cleanup:
> >      example_bpf__destroy(obj);
> >      return err != 0;
> > }
> >
> > trace_pipe output
> > ----------------------------------------
> > Chrome_IOThread-26477   [006] d..2 385580.114654: bpf_trace_printk:
> > family1 = 2, family2 = 1747691712
> > <...>-144100  [004] d..2 385594.936690: bpf_trace_printk: family1 = 2,
> > family2 = 0
> >
> > libbpf relocation log
> > ----------------------------------------
> > libbpf: loading kernel BTF '/sys/kernel/btf/vmlinux': 0
> > libbpf: sec 'cgroup/post_bind4': found 2 CO-RE relocations
> > libbpf: prog 'cgroup_post_bind4_prog': relo #0: kind <byte_off> (0),
> > spec is [2] struct bpf_sock.family (0:1 @ offset 4)
> > libbpf: CO-RE relocating [2] struct bpf_sock: found target candidate
> > [24518] struct bpf_sock
> > libbpf: prog 'cgroup_post_bind4_prog': relo #0: matching candidate #0
> > [24518] struct bpf_sock.family (0:1 @ offset 4)
> > libbpf: prog 'cgroup_post_bind4_prog': relo #0: patched insn #9
> > (ALU/ALU64) imm 4 -> 4
> > libbpf: prog 'cgroup_post_bind4_prog': relo #1: kind <byte_off> (0),
> > spec is [2] struct bpf_sock.family (0:1 @ offset 4)
> > libbpf: prog 'cgroup_post_bind4_prog': relo #1: matching candidate #0
> > [24518] struct bpf_sock.family (0:1 @ offset 4)
> > libbpf: prog 'cgroup_post_bind4_prog': relo #1: patched insn #12
> > (LDX/ST/STX) off 4 -> 4
> >

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-09-13 10:05 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-10 15:36 BPF_PROG_TYPE_CGROUP_SOCK bpf_sock ctx member direct access and BPF_CORE_READ return different values Juraj Vijtiuk
2021-09-10 17:44 ` Yonghong Song
2021-09-13 10:02   ` Juraj Vijtiuk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.