All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf@vger.kernel.org, "Alexei Starovoitov" <ast@kernel.org>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [PATCH bpf-next v1 03/15] bpf: Allow storing PTR_TO_BTF_ID in map
Date: Wed, 23 Feb 2022 08:39:25 +0530	[thread overview]
Message-ID: <20220223030925.uxevhkloz7dznkal@apollo.legion> (raw)
In-Reply-To: <20220222064619.hsadxbwzeg3go6jb@ast-mbp.dhcp.thefacebook.com>

On Tue, Feb 22, 2022 at 12:16:19PM IST, Alexei Starovoitov wrote:
> On Sun, Feb 20, 2022 at 07:18:01PM +0530, Kumar Kartikeya Dwivedi wrote:
> > This patch allows user to embed PTR_TO_BTF_ID in map value, such that
> > loading it marks the destination register as having the appropriate
> > register type and such a pointer can be dereferenced like usual
> > PTR_TO_BTF_ID and be passed to various BPF helpers.
> >
> > This feature can be useful to store an object in a map for a long time,
> > and then inspect it later. Since PTR_TO_BTF_ID is safe against invalid
> > access, verifier doesn't need to perform any complex lifetime checks. It
> > can be useful in cases where user already knows pointer will remain
> > valid, so any dereference at a later time (possibly in entirely
> > different BPF program invocation) will yield correct results as far the
> > data read from kernel memory is concerned.
> >
> > Note that it is quite possible such BTF ID pointer is invalid, in this
> > case the verifier's built-in exception handling mechanism where it
> > converts loads into PTR_TO_BTF_ID into PROBE_MEM loads, would handle the
> > invalid case. Next patch which adds referenced PTR_TO_BTF_ID would need
> > to take more care in ensuring a correct value is stored in the BPF map.
> >
> > The user indicates that a certain pointer must be treated as
> > PTR_TO_BTF_ID by using a BTF type tag 'btf_id' on the pointed to type of
> > the pointer. Then, this information is recorded in the object BTF which
> > will be passed into the kernel by way of map's BTF information.
> >
> > The kernel then records the type, and offset of all such pointers, and
> > finds their corresponding built-in kernel type by the name and BTF kind.
> >
> > Later, during verification this information is used that access to such
> > pointers is sized correctly, and done at a proper offset into the map
> > value. Only BPF_LDX, BPF_STX, and BPF_ST with 0 (to denote NULL) are
> > allowed instructions that can access such a pointer. On BPF_LDX, the
> > destination register is updated to be a PTR_TO_BTF_ID, and on BPF_STX,
> > it is checked whether the source register type is same PTR_TO_BTF_ID,
> > and whether the BTF ID (reg->btf and reg->btf_id) matches the type
> > specified in the map value's definition.
> >
> > Hence, the verifier allows flexible access to kernel data across program
> > invocations in a type safe manner, without compromising on the runtime
> > safety of the kernel.
> >
> > Next patch will extend this support to referenced PTR_TO_BTF_ID.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  include/linux/bpf.h     |  30 +++++++-
> >  include/linux/btf.h     |   3 +
> >  kernel/bpf/btf.c        | 127 ++++++++++++++++++++++++++++++++++
> >  kernel/bpf/map_in_map.c |   5 +-
> >  kernel/bpf/syscall.c    | 137 ++++++++++++++++++++++++++++++++++++-
> >  kernel/bpf/verifier.c   | 148 ++++++++++++++++++++++++++++++++++++++++
> >  6 files changed, 446 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index f19abc59b6cd..ce45ffb79f82 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -155,6 +155,23 @@ struct bpf_map_ops {
> >  	const struct bpf_iter_seq_info *iter_seq_info;
> >  };
> >
> > +enum {
> > +	/* Support at most 8 pointers in a BPF map value */
> > +	BPF_MAP_VALUE_OFF_MAX = 8,
> > +};
> > +
> > +struct bpf_map_value_off_desc {
> > +	u32 offset;
> > +	u32 btf_id;
> > +	struct btf *btf;
> > +	struct module *module;
> > +};
> > +
> > +struct bpf_map_value_off {
> > +	u32 nr_off;
> > +	struct bpf_map_value_off_desc off[];
> > +};
> > +
> >  struct bpf_map {
> >  	/* The first two cachelines with read-mostly members of which some
> >  	 * are also accessed in fast-path (e.g. ops, max_entries).
> > @@ -171,6 +188,7 @@ struct bpf_map {
> >  	u64 map_extra; /* any per-map-type extra fields */
> >  	u32 map_flags;
> >  	int spin_lock_off; /* >=0 valid offset, <0 error */
> > +	struct bpf_map_value_off *ptr_off_tab;
> >  	int timer_off; /* >=0 valid offset, <0 error */
> >  	u32 id;
> >  	int numa_node;
> > @@ -184,7 +202,7 @@ struct bpf_map {
> >  	char name[BPF_OBJ_NAME_LEN];
> >  	bool bypass_spec_v1;
> >  	bool frozen; /* write-once; write-protected by freeze_mutex */
> > -	/* 14 bytes hole */
> > +	/* 6 bytes hole */
> >
> >  	/* The 3rd and 4th cacheline with misc members to avoid false sharing
> >  	 * particularly with refcounting.
> > @@ -217,6 +235,11 @@ static inline bool map_value_has_timer(const struct bpf_map *map)
> >  	return map->timer_off >= 0;
> >  }
> >
> > +static inline bool map_value_has_ptr_to_btf_id(const struct bpf_map *map)
> > +{
> > +	return !IS_ERR_OR_NULL(map->ptr_off_tab);
> > +}
> > +
> >  static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
> >  {
> >  	if (unlikely(map_value_has_spin_lock(map)))
> > @@ -1490,6 +1513,11 @@ void bpf_prog_put(struct bpf_prog *prog);
> >  void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock);
> >  void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock);
> >
> > +struct bpf_map_value_off_desc *bpf_map_ptr_off_contains(struct bpf_map *map, u32 offset);
> > +void bpf_map_free_ptr_off_tab(struct bpf_map *map);
> > +struct bpf_map_value_off *bpf_map_copy_ptr_off_tab(const struct bpf_map *map);
> > +bool bpf_map_equal_ptr_off_tab(const struct bpf_map *map_a, const struct bpf_map *map_b);
> > +
> >  struct bpf_map *bpf_map_get(u32 ufd);
> >  struct bpf_map *bpf_map_get_with_uref(u32 ufd);
> >  struct bpf_map *__bpf_map_get(struct fd f);
> > diff --git a/include/linux/btf.h b/include/linux/btf.h
> > index 36bc09b8e890..6592183aeb23 100644
> > --- a/include/linux/btf.h
> > +++ b/include/linux/btf.h
> > @@ -26,6 +26,7 @@ struct btf_type;
> >  union bpf_attr;
> >  struct btf_show;
> >  struct btf_id_set;
> > +struct bpf_map;
> >
> >  struct btf_kfunc_id_set {
> >  	struct module *owner;
> > @@ -123,6 +124,8 @@ bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
> >  			   u32 expected_offset, u32 expected_size);
> >  int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t);
> >  int btf_find_timer(const struct btf *btf, const struct btf_type *t);
> > +int btf_find_ptr_to_btf_id(const struct btf *btf, const struct btf_type *t,
> > +			   struct bpf_map *map);
> >  bool btf_type_is_void(const struct btf_type *t);
> >  s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind);
> >  const struct btf_type *btf_type_skip_modifiers(const struct btf *btf,
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index 55f6ccac3388..1edb5710e155 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -3122,6 +3122,7 @@ static void btf_struct_log(struct btf_verifier_env *env,
> >  enum {
> >  	BTF_FIELD_SPIN_LOCK,
> >  	BTF_FIELD_TIMER,
> > +	BTF_FIELD_KPTR,
> >  };
> >
> >  static int btf_find_field_struct(const struct btf *btf, const struct btf_type *t,
> > @@ -3140,6 +3141,106 @@ static int btf_find_field_struct(const struct btf *btf, const struct btf_type *t
> >  	return 0;
> >  }
> >
> > +static s32 btf_find_by_name_kind_all(const char *name, u32 kind, struct btf **btfp);
> > +
> > +static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
> > +			       u32 off, int sz, void *data)
> > +{
> > +	struct bpf_map_value_off *tab;
> > +	struct bpf_map *map = data;
> > +	struct module *mod = NULL;
> > +	bool btf_id_tag = false;
> > +	struct btf *kernel_btf;
> > +	int nr_off, ret;
> > +	s32 id;
> > +
> > +	/* For PTR, sz is always == 8 */
> > +	if (!btf_type_is_ptr(t))
> > +		return 0;
> > +	t = btf_type_by_id(btf, t->type);
> > +
> > +	while (btf_type_is_type_tag(t)) {
> > +		if (!strcmp("kernel.bpf.btf_id", __btf_name_by_offset(btf, t->name_off))) {
>
> All of these strings consume space.
> Multiple tags consume space too.
> I would just do:
> #define __kptr __attribute__((btf_type_tag("kptr")))
> #define __kptr_ref __attribute__((btf_type_tag("kptr_ref")))
> #define __kptr_percpu __attribute__((btf_type_tag("kptr_percpu")))
> #define __kptr_user __attribute__((btf_type_tag("kptr_user")))
>

Ok.

> > +			/* repeated tag */
> > +			if (btf_id_tag) {
> > +				ret = -EINVAL;
> > +				goto end;
> > +			}
> > +			btf_id_tag = true;
> > +		} else if (!strncmp("kernel.", __btf_name_by_offset(btf, t->name_off),
> > +			   sizeof("kernel.") - 1)) {
> > +			/* TODO: Should we reject these when loading BTF? */
> > +			/* Unavailable tag in reserved tag namespace */
>
> I don't think we need to reserve the tag space.
> There is little risk to break progs with future tags.
> I would just drop this 'if'.
>

Fine with dropping, but what is the expected behavior when userspace has set a
tag in map value BTF that we give some meaning in the kernel later?

> > +			ret = -EACCES;
> > +			goto end;
> > +		}
> > +		/* Look for next tag */
> > +		t = btf_type_by_id(btf, t->type);
> > +	}
> > +	if (!btf_id_tag)
> > +		return 0;
> > +
> > +	/* Get the base type */
> > +	if (btf_type_is_modifier(t))
> > +		t = btf_type_skip_modifiers(btf, t->type, NULL);
> > +	/* Only pointer to struct is allowed */
> > +	if (!__btf_type_is_struct(t)) {
> > +		ret = -EINVAL;
> > +		goto end;
> > +	}
> > +
> > +	id = btf_find_by_name_kind_all(__btf_name_by_offset(btf, t->name_off),
> > +				       BTF_INFO_KIND(t->info), &kernel_btf);
> > +	if (id < 0) {
> > +		ret = id;
> > +		goto end;
> > +	}
> > +
> > +	nr_off = map->ptr_off_tab ? map->ptr_off_tab->nr_off : 0;
> > +	if (nr_off == BPF_MAP_VALUE_OFF_MAX) {
> > +		ret = -E2BIG;
> > +		goto end_btf;
> > +	}
> > +
> > +	tab = krealloc(map->ptr_off_tab, offsetof(struct bpf_map_value_off, off[nr_off + 1]),
> > +		       GFP_KERNEL | __GFP_NOWARN);
>
> Argh.
> If the function is called btf_find_field() it should do 'find' and only 'find'.
> It should be side effect free and should find _one_ field.
> If you want a function with side effcts it should be called something like btf_walk_fields.
>
> For this case how about side effect free btf_find_fieldS() that will populate array
> struct bpf_field_info {
>   struct btf *type; /* set for spin_lock, timer, kptr */
>   u32 off;
>   int flags; /* ref|percpu|user for kptr */
> };
>
> cnt = btf_find_fields(prog_btf, value_type, BTF_FIELD_SPIN_LOCK|TIMER|KPTR, fields);
>
> btf_find_struct_field/btf_find_datasec_var will keep the count and will error
> when it reaches BPF_MAP_VALUE_OFF_MAX.
> switch (field_type) {
> case BTF_FIELD_SPIN_LOCK:
>    btf_find_field_struct(... "bpf_spin_lock",
>                              sizeof(struct bpf_spin_lock),
>                              __alignof__(struct bpf_spin_lock),
>                              fields + i);
> case BTF_FIELD_TIMER:
>    btf_find_field_struct(... "bpf_timer", sizeof, alignof, fields + i);
> case BTF_FIELD_KPTR:
>    btf_find_field_kptr(... fields + i);
> }
>
> btf_find_by_name_kind_all (or new name bpf_find_btf_id)
> will be done after btf_find_fields() is over.
> dtor will be found after as well.
> struct bpf_map_value_off will be allocated once.
>

Ack, sounds good.

> > +	if (!tab) {
> > +		ret = -ENOMEM;
> > +		goto end_btf;
> > +	}
> > +	/* Initialize nr_off for newly allocated ptr_off_tab */
> > +	if (!map->ptr_off_tab)
> > +		tab->nr_off = 0;
> > +	map->ptr_off_tab = tab;
> > +
> > +	/* We take reference to make sure valid pointers into module data don't
> > +	 * become invalid across program invocation.
> > +	 */
>
> what is the point of grabbing mod ref?
> This patch needs btf only and its refcnt will be incremented by bpf_find_btf_id.
> Is that because of future dtor ?
> Then it should be part of that patch.
>

Right, screwed it up while rebasing. Will fix.

> > +	if (btf_is_module(kernel_btf)) {
> > +		mod = btf_try_get_module(kernel_btf);
> > +		if (!mod) {
> > +			ret = -ENXIO;
> > +			goto end_btf;
> > +		}
> > +	}
> > +
> > +	tab->off[nr_off].offset = off;
> > +	tab->off[nr_off].btf_id = id;
> > +	tab->off[nr_off].btf    = kernel_btf;
> > +	tab->off[nr_off].module = mod;
> > +	tab->nr_off++;
> > +
> > +	return 0;
> > +end_btf:
> > +	/* Reference is only raised for module BTF */
> > +	if (btf_is_module(kernel_btf))
> > +		btf_put(kernel_btf);
>
> see earlier suggestion. this 'if' can be dropped if we btf_get for vmlinux_btf too.
>
> > +end:
> > +	bpf_map_free_ptr_off_tab(map);
> > +	map->ptr_off_tab = ERR_PTR(ret);
> > +	return ret;
> > +}

--
Kartikeya

  reply	other threads:[~2022-02-23  3:09 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-20 13:47 [PATCH bpf-next v1 00/15] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
2022-02-20 13:47 ` [PATCH bpf-next v1 01/15] bpf: Factor out fd returning from bpf_btf_find_by_name_kind Kumar Kartikeya Dwivedi
2022-02-22  5:28   ` Alexei Starovoitov
2022-02-23  3:05     ` Kumar Kartikeya Dwivedi
2022-02-20 13:48 ` [PATCH bpf-next v1 02/15] bpf: Make btf_find_field more generic Kumar Kartikeya Dwivedi
2022-02-20 13:48 ` [PATCH bpf-next v1 03/15] bpf: Allow storing PTR_TO_BTF_ID in map Kumar Kartikeya Dwivedi
2022-02-22  6:46   ` Alexei Starovoitov
2022-02-23  3:09     ` Kumar Kartikeya Dwivedi [this message]
2022-02-23 21:46       ` Alexei Starovoitov
2022-02-20 13:48 ` [PATCH bpf-next v1 04/15] bpf: Allow storing referenced " Kumar Kartikeya Dwivedi
2022-02-22  6:53   ` Alexei Starovoitov
2022-02-22  7:10     ` Kumar Kartikeya Dwivedi
2022-02-22 16:20       ` Alexei Starovoitov
2022-02-23  3:04         ` Kumar Kartikeya Dwivedi
2022-02-23 21:52           ` Alexei Starovoitov
2022-02-24  8:43             ` Kumar Kartikeya Dwivedi
2022-02-20 13:48 ` [PATCH bpf-next v1 05/15] bpf: Allow storing PTR_TO_PERCPU_BTF_ID " Kumar Kartikeya Dwivedi
2022-02-20 20:40   ` kernel test robot
2022-02-20 13:48 ` [PATCH bpf-next v1 06/15] bpf: Allow storing __user PTR_TO_BTF_ID " Kumar Kartikeya Dwivedi
2022-02-20 13:48 ` [PATCH bpf-next v1 07/15] bpf: Prevent escaping of pointers loaded from maps Kumar Kartikeya Dwivedi
2022-02-20 13:48 ` [PATCH bpf-next v1 08/15] bpf: Adapt copy_map_value for multiple offset case Kumar Kartikeya Dwivedi
2022-02-22  7:04   ` Alexei Starovoitov
2022-02-23  3:13     ` Kumar Kartikeya Dwivedi
2022-02-23 21:41       ` Alexei Starovoitov
2022-02-20 13:48 ` [PATCH bpf-next v1 09/15] bpf: Populate pairs of btf_id and destructor kfunc in btf Kumar Kartikeya Dwivedi
2022-02-20 13:48 ` [PATCH bpf-next v1 10/15] bpf: Wire up freeing of referenced PTR_TO_BTF_ID in map Kumar Kartikeya Dwivedi
2022-02-20 21:43   ` kernel test robot
2022-02-20 22:55   ` kernel test robot
2022-02-21  0:39   ` kernel test robot
2022-02-20 13:48 ` [PATCH bpf-next v1 11/15] bpf: Teach verifier about kptr_get style kfunc helpers Kumar Kartikeya Dwivedi
2022-02-20 13:48 ` [PATCH bpf-next v1 12/15] net/netfilter: Add bpf_ct_kptr_get helper Kumar Kartikeya Dwivedi
2022-02-21  4:35   ` kernel test robot
2022-02-20 13:48 ` [PATCH bpf-next v1 13/15] libbpf: Add __kptr* macros to bpf_helpers.h Kumar Kartikeya Dwivedi
2022-02-20 13:48 ` [PATCH bpf-next v1 14/15] selftests/bpf: Add C tests for PTR_TO_BTF_ID in map Kumar Kartikeya Dwivedi
2022-02-20 13:48 ` [PATCH bpf-next v1 15/15] selftests/bpf: Add verifier " Kumar Kartikeya Dwivedi
2022-02-22  6:05 ` [PATCH bpf-next v1 00/15] Introduce typed pointer support in BPF maps Song Liu
2022-02-22  8:21   ` Kumar Kartikeya Dwivedi
2022-02-23  7:29     ` Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220223030925.uxevhkloz7dznkal@apollo.legion \
    --to=memxor@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=hawk@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=toke@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.