Re: [PATCH bpf-next 11/13] bpf: libbpf: Add STRUCT_OPS support

From: Martin Lau <kafai@fb.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	David Miller <davem@davemloft.net>,
	"Kernel Team" <Kernel-team@fb.com>,
	Networking <netdev@vger.kernel.org>
Subject: Re: [PATCH bpf-next 11/13] bpf: libbpf: Add STRUCT_OPS support
Date: Wed, 18 Dec 2019 07:03:45 +0000	[thread overview]
Message-ID: <20191218070341.fd2ypexmeca5cefa@kafai-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <CAEf4BzbJoso7A0dn=xhOkFMOcKqZ6wYp=XoqGiL+FO+0VKqh5g@mail.gmail.com>

On Tue, Dec 17, 2019 at 07:07:23PM -0800, Andrii Nakryiko wrote:
> On Fri, Dec 13, 2019 at 4:48 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > This patch adds BPF STRUCT_OPS support to libbpf.
> >
> > The only sec_name convention is SEC("struct_ops") to identify the
> > struct ops implemented in BPF, e.g.
> > SEC("struct_ops")
> > struct tcp_congestion_ops dctcp = {
> >         .init           = (void *)dctcp_init,  /* <-- a bpf_prog */
> >         /* ... some more func prts ... */
> >         .name           = "bpf_dctcp",
> > };
> >
> > In the bpf_object__open phase, libbpf will look for the "struct_ops"
> > elf section and find out what is the btf-type the "struct_ops" is
> > implementing.  Note that the btf-type here is referring to
> > a type in the bpf_prog.o's btf.  It will then collect (through SHT_REL)
> > where are the bpf progs that the func ptrs are referring to.
> >
> > In the bpf_object__load phase, the prepare_struct_ops() will load
> > the btf_vmlinux and obtain the corresponding kernel's btf-type.
> > With the kernel's btf-type, it can then set the prog->type,
> > prog->attach_btf_id and the prog->expected_attach_type.  Thus,
> > the prog's properties do not rely on its section name.
> >
> > Currently, the bpf_prog's btf-type ==> btf_vmlinux's btf-type matching
> > process is as simple as: member-name match + btf-kind match + size match.
> > If these matching conditions fail, libbpf will reject.
> > The current targeting support is "struct tcp_congestion_ops" which
> > most of its members are function pointers.
> > The member ordering of the bpf_prog's btf-type can be different from
> > the btf_vmlinux's btf-type.
> >
> > Once the prog's properties are all set,
> > the libbpf will proceed to load all the progs.
> >
> > After that, register_struct_ops() will create a map, finalize the
> > map-value by populating it with the prog-fd, and then register this
> > "struct_ops" to the kernel by updating the map-value to the map.
> >
> > By default, libbpf does not unregister the struct_ops from the kernel
> > during bpf_object__close().  It can be changed by setting the new
> > "unreg_st_ops" in bpf_object_open_opts.
> >
> > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > ---
> 
> This looks pretty good to me. The big two things is exposing structops
> as real struct bpf_map, so that users can interact with it using
> libbpf APIs, as well as splitting struct_ops map creation and
> registration. bpf_object__load() should only make sure all maps are
> created, progs are loaded/verified, but none of BPF program can yet be
> called. Then attach is the phase where registration happens.
Thanks for the review.

[ ... ]

> >  static inline __u64 ptr_to_u64(const void *ptr)
> >  {
> >         return (__u64) (unsigned long) ptr;
> > @@ -233,6 +239,32 @@ struct bpf_map {
> >         bool reused;
> >  };
> >
> > +struct bpf_struct_ops {
> > +       const char *var_name;
> > +       const char *tname;
> > +       const struct btf_type *type;
> > +       struct bpf_program **progs;
> > +       __u32 *kern_func_off;
> > +       /* e.g. struct tcp_congestion_ops in bpf_prog's btf format */
> > +       void *data;
> > +       /* e.g. struct __bpf_tcp_congestion_ops in btf_vmlinux's btf
> 
> Using __bpf_ prefix for this struct_ops-specific types is a bit too
> generic (e.g., for raw_tp stuff Alexei used btf_trace_). So maybe make
> it btf_ops_ or btf_structops_?
Is it a concern on name collision?

The prefix pick is to use a more representative name.
struct_ops use many bpf pieces and btf is one of them.
Very soon, all new codes will depend on BTF and btf_ prefix
could become generic also.

Unlike tracepoint, there is no non-btf version of struct_ops.  

> 
> 
> > +        * format.
> > +        * struct __bpf_tcp_congestion_ops {
> > +        *      [... some other kernel fields ...]
> > +        *      struct tcp_congestion_ops data;
> > +        * }
> > +        * kern_vdata in the sizeof(struct __bpf_tcp_congestion_ops).
> 
> Comment isn't very clear.. do you mean that data pointed to by
> kern_vdata is of sizeof(...) bytes?
> 
> > +        * prepare_struct_ops() will populate the "data" into
> > +        * "kern_vdata".
> > +        */
> > +       void *kern_vdata;
> > +       __u32 type_id;
> > +       __u32 kern_vtype_id;
> > +       __u32 kern_vtype_size;
> > +       int fd;
> > +       bool unreg;
> 
> This unreg flag (and default behavior to not unregister) is bothering
> me a bit.. Shouldn't this be controlled by map's lifetime, at least.
> E.g., if no one pins that map - then struct_ops should be unregistered
> on map destruction. If application wants to keep BPF programs
> attached, it should make sure to pin map, before userspace part exits?
> Is this problematic in any way?
I don't think it should in the struct_ops case.  I think of the
struct_ops map is a set of progs "attach" to a subsystem (tcp_cong
in this case) and this map-progs stay (or keep attaching) until it is
detached.  Like other attached bpf_prog keeps running without
caring if the bpf_prog is pinned or not.

About the "bool unreg;", the default can be changed to true if
it makes more sense.

[ ... ]

> 
> > +
> > +               kern_data = st_ops->kern_vdata + st_ops->kern_func_off[i];
> > +               *(unsigned long *)kern_data = prog_fd;
> > +       }
> > +
> > +       map_attr.map_type = BPF_MAP_TYPE_STRUCT_OPS;
> > +       map_attr.key_size = sizeof(unsigned int);
> > +       map_attr.value_size = st_ops->kern_vtype_size;
> > +       map_attr.max_entries = 1;
> > +       map_attr.btf_fd = btf__fd(obj->btf);
> > +       map_attr.btf_vmlinux_value_type_id = st_ops->kern_vtype_id;
> > +       map_attr.name = st_ops->var_name;
> > +
> > +       fd = bpf_create_map_xattr(&map_attr);
> 
> we should try to reuse bpf_object__init_internal_map(). This will add
> struct bpf_map which users can iterate over and look up by name, etc.
> We had similar discussion when Daniel was adding  global data maps,
> and we conclusively decided that these special maps have to be
> represented in libbpf as struct bpf_map as well.
I will take a look.

> 
> > +       if (fd < 0) {
> > +               err = -errno;
> > +               pr_warn("struct_ops register %s: Error in creating struct_ops map\n",
> > +                       tname);
> > +               return err;
> > +       }
> > +
> > +       err = bpf_map_update_elem(fd, &zero, st_ops->kern_vdata, 0);