All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: bpf@vger.kernel.org, "Alexei Starovoitov" <ast@kernel.org>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Jesper Dangaard Brouer" <brouer@redhat.com>
Subject: Re: [PATCH bpf-next v2 03/15] bpf: Allow storing unreferenced kptr in map
Date: Sat, 19 Mar 2022 15:57:44 -0700	[thread overview]
Message-ID: <20220319225744.xlebrwxerrw3isds@ast-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <20220319215046.u6fttplj5znzgoaf@apollo>

On Sun, Mar 20, 2022 at 03:20:46AM +0530, Kumar Kartikeya Dwivedi wrote:
> On Sun, Mar 20, 2022 at 03:09:20AM IST, Kumar Kartikeya Dwivedi wrote:
> > On Sun, Mar 20, 2022 at 02:47:54AM IST, Alexei Starovoitov wrote:
> > > On Sun, Mar 20, 2022 at 12:22:51AM +0530, Kumar Kartikeya Dwivedi wrote:
> > > > On Sat, Mar 19, 2022 at 11:45:38PM IST, Alexei Starovoitov wrote:
> > > > > On Thu, Mar 17, 2022 at 05:29:45PM +0530, Kumar Kartikeya Dwivedi wrote:
> > > > > > This commit introduces a new pointer type 'kptr' which can be embedded
> > > > > > in a map value as holds a PTR_TO_BTF_ID stored by a BPF program during
> > > > > > its invocation. Storing to such a kptr, BPF program's PTR_TO_BTF_ID
> > > > > > register must have the same type as in the map value's BTF, and loading
> > > > > > a kptr marks the destination register as PTR_TO_BTF_ID with the correct
> > > > > > kernel BTF and BTF ID.
> > > > > >
> > > > > > Such kptr are unreferenced, i.e. by the time another invocation of the
> > > > > > BPF program loads this pointer, the object which the pointer points to
> > > > > > may not longer exist. Since PTR_TO_BTF_ID loads (using BPF_LDX) are
> > > > > > patched to PROBE_MEM loads by the verifier, it would safe to allow user
> > > > > > to still access such invalid pointer, but passing such pointers into
> > > > > > BPF helpers and kfuncs should not be permitted. A future patch in this
> > > > > > series will close this gap.
> > > > > >
> > > > > > The flexibility offered by allowing programs to dereference such invalid
> > > > > > pointers while being safe at runtime frees the verifier from doing
> > > > > > complex lifetime tracking. As long as the user may ensure that the
> > > > > > object remains valid, it can ensure data read by it from the kernel
> > > > > > object is valid.
> > > > > >
> > > > > > The user indicates that a certain pointer must be treated as kptr
> > > > > > capable of accepting stores of PTR_TO_BTF_ID of a certain type, by using
> > > > > > a BTF type tag 'kptr' on the pointed to type of the pointer. Then, this
> > > > > > information is recorded in the object BTF which will be passed into the
> > > > > > kernel by way of map's BTF information. The name and kind from the map
> > > > > > value BTF is used to look up the in-kernel type, and the actual BTF and
> > > > > > BTF ID is recorded in the map struct in a new kptr_off_tab member. For
> > > > > > now, only storing pointers to structs is permitted.
> > > > > >
> > > > > > An example of this specification is shown below:
> > > > > >
> > > > > > 	#define __kptr __attribute__((btf_type_tag("kptr")))
> > > > > >
> > > > > > 	struct map_value {
> > > > > > 		...
> > > > > > 		struct task_struct __kptr *task;
> > > > > > 		...
> > > > > > 	};
> > > > > >
> > > > > > Then, in a BPF program, user may store PTR_TO_BTF_ID with the type
> > > > > > task_struct into the map, and then load it later.
> > > > > >
> > > > > > Note that the destination register is marked PTR_TO_BTF_ID_OR_NULL, as
> > > > > > the verifier cannot know whether the value is NULL or not statically, it
> > > > > > must treat all potential loads at that map value offset as loading a
> > > > > > possibly NULL pointer.
> > > > > >
> > > > > > Only BPF_LDX, BPF_STX, and BPF_ST with insn->imm = 0 (to denote NULL)
> > > > > > are allowed instructions that can access such a pointer. On BPF_LDX, the
> > > > > > destination register is updated to be a PTR_TO_BTF_ID, and on BPF_STX,
> > > > > > it is checked whether the source register type is a PTR_TO_BTF_ID with
> > > > > > same BTF type as specified in the map BTF. The access size must always
> > > > > > be BPF_DW.
> > > > > >
> > > > > > For the map in map support, the kptr_off_tab for outer map is copied
> > > > > > from the inner map's kptr_off_tab. It was chosen to do a deep copy
> > > > > > instead of introducing a refcount to kptr_off_tab, because the copy only
> > > > > > needs to be done when paramterizing using inner_map_fd in the map in map
> > > > > > case, hence would be unnecessary for all other users.
> > > > > >
> > > > > > It is not permitted to use MAP_FREEZE command and mmap for BPF map
> > > > > > having kptr, similar to the bpf_timer case.
> > > > > >
> > > > > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > > > > ---
> > > > > >  include/linux/bpf.h     |  29 +++++-
> > > > > >  include/linux/btf.h     |   2 +
> > > > > >  kernel/bpf/btf.c        | 151 +++++++++++++++++++++++++----
> > > > > >  kernel/bpf/map_in_map.c |   5 +-
> > > > > >  kernel/bpf/syscall.c    | 110 ++++++++++++++++++++-
> > > > > >  kernel/bpf/verifier.c   | 207 ++++++++++++++++++++++++++++++++--------
> > > > > >  6 files changed, 442 insertions(+), 62 deletions(-)
> > > > > >
> > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > > > index 88449fbbe063..f35920d279dd 100644
> > > > > > --- a/include/linux/bpf.h
> > > > > > +++ b/include/linux/bpf.h
> > > > > > @@ -155,6 +155,22 @@ struct bpf_map_ops {
> > > > > >  	const struct bpf_iter_seq_info *iter_seq_info;
> > > > > >  };
> > > > > >
> > > > > > +enum {
> > > > > > +	/* Support at most 8 pointers in a BPF map value */
> > > > > > +	BPF_MAP_VALUE_OFF_MAX = 8,
> > > > > > +};
> > > > > > +
> > > > > > +struct bpf_map_value_off_desc {
> > > > > > +	u32 offset;
> > > > > > +	u32 btf_id;
> > > > > > +	struct btf *btf;
> > > > > > +};
> > > > > > +
> > > > > > +struct bpf_map_value_off {
> > > > > > +	u32 nr_off;
> > > > > > +	struct bpf_map_value_off_desc off[];
> > > > > > +};
> > > > > > +
> > > > > >  struct bpf_map {
> > > > > >  	/* The first two cachelines with read-mostly members of which some
> > > > > >  	 * are also accessed in fast-path (e.g. ops, max_entries).
> > > > > > @@ -171,6 +187,7 @@ struct bpf_map {
> > > > > >  	u64 map_extra; /* any per-map-type extra fields */
> > > > > >  	u32 map_flags;
> > > > > >  	int spin_lock_off; /* >=0 valid offset, <0 error */
> > > > > > +	struct bpf_map_value_off *kptr_off_tab;
> > > > > >  	int timer_off; /* >=0 valid offset, <0 error */
> > > > > >  	u32 id;
> > > > > >  	int numa_node;
> > > > > > @@ -184,7 +201,7 @@ struct bpf_map {
> > > > > >  	char name[BPF_OBJ_NAME_LEN];
> > > > > >  	bool bypass_spec_v1;
> > > > > >  	bool frozen; /* write-once; write-protected by freeze_mutex */
> > > > > > -	/* 14 bytes hole */
> > > > > > +	/* 6 bytes hole */
> > > > > >
> > > > > >  	/* The 3rd and 4th cacheline with misc members to avoid false sharing
> > > > > >  	 * particularly with refcounting.
> > > > > > @@ -217,6 +234,11 @@ static inline bool map_value_has_timer(const struct bpf_map *map)
> > > > > >  	return map->timer_off >= 0;
> > > > > >  }
> > > > > >
> > > > > > +static inline bool map_value_has_kptr(const struct bpf_map *map)
> > > > > > +{
> > > > > > +	return !IS_ERR_OR_NULL(map->kptr_off_tab);
> > > > > > +}
> > > > > > +
> > > > > >  static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
> > > > > >  {
> > > > > >  	if (unlikely(map_value_has_spin_lock(map)))
> > > > > > @@ -1497,6 +1519,11 @@ void bpf_prog_put(struct bpf_prog *prog);
> > > > > >  void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock);
> > > > > >  void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock);
> > > > > >
> > > > > > +struct bpf_map_value_off_desc *bpf_map_kptr_off_contains(struct bpf_map *map, u32 offset);
> > > > > > +void bpf_map_free_kptr_off_tab(struct bpf_map *map);
> > > > > > +struct bpf_map_value_off *bpf_map_copy_kptr_off_tab(const struct bpf_map *map);
> > > > > > +bool bpf_map_equal_kptr_off_tab(const struct bpf_map *map_a, const struct bpf_map *map_b);
> > > > > > +
> > > > > >  struct bpf_map *bpf_map_get(u32 ufd);
> > > > > >  struct bpf_map *bpf_map_get_with_uref(u32 ufd);
> > > > > >  struct bpf_map *__bpf_map_get(struct fd f);
> > > > > > diff --git a/include/linux/btf.h b/include/linux/btf.h
> > > > > > index 36bc09b8e890..5b578dc81c04 100644
> > > > > > --- a/include/linux/btf.h
> > > > > > +++ b/include/linux/btf.h
> > > > > > @@ -123,6 +123,8 @@ bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
> > > > > >  			   u32 expected_offset, u32 expected_size);
> > > > > >  int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t);
> > > > > >  int btf_find_timer(const struct btf *btf, const struct btf_type *t);
> > > > > > +struct bpf_map_value_off *btf_find_kptr(const struct btf *btf,
> > > > > > +					const struct btf_type *t);
> > > > > >  bool btf_type_is_void(const struct btf_type *t);
> > > > > >  s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind);
> > > > > >  const struct btf_type *btf_type_skip_modifiers(const struct btf *btf,
> > > > > > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > > > > > index 5b2824332880..9ac9364ef533 100644
> > > > > > --- a/kernel/bpf/btf.c
> > > > > > +++ b/kernel/bpf/btf.c
> > > > > > @@ -3164,33 +3164,79 @@ static void btf_struct_log(struct btf_verifier_env *env,
> > > > > >  enum {
> > > > > >  	BTF_FIELD_SPIN_LOCK,
> > > > > >  	BTF_FIELD_TIMER,
> > > > > > +	BTF_FIELD_KPTR,
> > > > > > +};
> > > > > > +
> > > > > > +enum {
> > > > > > +	BTF_FIELD_IGNORE = 0,
> > > > > > +	BTF_FIELD_FOUND  = 1,
> > > > > >  };
> > > > > >
> > > > > >  struct btf_field_info {
> > > > > > +	const struct btf_type *type;
> > > > > >  	u32 off;
> > > > > >  };
> > > > > >
> > > > > >  static int btf_find_field_struct(const struct btf *btf, const struct btf_type *t,
> > > > > > -				 u32 off, int sz, struct btf_field_info *info)
> > > > > > +				 u32 off, int sz, struct btf_field_info *info,
> > > > > > +				 int info_cnt, int idx)
> > > > > >  {
> > > > > >  	if (!__btf_type_is_struct(t))
> > > > > > -		return 0;
> > > > > > +		return BTF_FIELD_IGNORE;
> > > > > >  	if (t->size != sz)
> > > > > > -		return 0;
> > > > > > -	if (info->off != -ENOENT)
> > > > > > -		/* only one such field is allowed */
> > > > > > +		return BTF_FIELD_IGNORE;
> > > > > > +	if (idx >= info_cnt)
> > > > >
> > > > > No need to pass info_cnt, idx into this function.
> > > > > Move idx >= info_cnt check into the caller and let
> > > > > caller do 'info++' and pass that.
> > > >
> > > > That was what I did initially, but this check actually needs to happen after we
> > > > see that the field is of interest (i.e. not ignored by btf_find_field_*). Doing
> > > > it in caller limits total fields to info_cnt. Moving those checks out into the
> > > > caller may be the other option, but I didn't like that. I can add a comment if
> > > > it makes things clear.
> > >
> > > don't increment info unconditionally?
> > > only when field is found.
> > >
> >
> > Right now the j++ happens only when we find a field. What I'm saying is that if
> > you now move the idx (which is j in caller) >= info_cnt out into the loop, later
> > iteration will return error even if it is not a timer, spin_lock, or kptr field,
> > so actual check is done inside the function after we know that for this specific
> > case it can only be a timer, spin_lock, or kptr, and we already have no more room
> > to record their info.
> >
> > e.g. there can be a case when we end up at j == info_cnt (all infos used), but
> > we still find a kptr, so we should only return error on seeing j == info_cnt
> > once we know that field is a kptr, because we reached the total limit of kptrs
> > in a map value.
> >
> 
> One other option might be doing the check in caller _after_ we see
> BTF_FIELD_FOUND, but this would require bumping the size to max + 1, so that the
> write to info inside the function doesn't write past the end of the array.

yep. Extra element in the array will work.
Or split btf_find_field_struct() into find and populate.

  reply	other threads:[~2022-03-19 22:57 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-17 11:59 [PATCH bpf-next v2 00/15] Introduce typed pointer support in BPF maps Kumar Kartikeya Dwivedi
2022-03-17 11:59 ` [PATCH bpf-next v2 01/15] bpf: Factor out fd returning from bpf_btf_find_by_name_kind Kumar Kartikeya Dwivedi
2022-03-17 11:59 ` [PATCH bpf-next v2 02/15] bpf: Make btf_find_field more generic Kumar Kartikeya Dwivedi
2022-03-19 17:55   ` Alexei Starovoitov
2022-03-19 19:31     ` Kumar Kartikeya Dwivedi
2022-03-19 20:06       ` Kumar Kartikeya Dwivedi
2022-03-19 21:30         ` Alexei Starovoitov
2022-03-17 11:59 ` [PATCH bpf-next v2 03/15] bpf: Allow storing unreferenced kptr in map Kumar Kartikeya Dwivedi
2022-03-19 18:15   ` Alexei Starovoitov
2022-03-19 18:52     ` Kumar Kartikeya Dwivedi
2022-03-19 21:17       ` Alexei Starovoitov
2022-03-19 21:39         ` Kumar Kartikeya Dwivedi
2022-03-19 21:50           ` Kumar Kartikeya Dwivedi
2022-03-19 22:57             ` Alexei Starovoitov [this message]
2022-03-17 11:59 ` [PATCH bpf-next v2 04/15] bpf: Allow storing referenced " Kumar Kartikeya Dwivedi
2022-03-19 18:24   ` Alexei Starovoitov
2022-03-19 18:59     ` Kumar Kartikeya Dwivedi
2022-03-19 21:23       ` Alexei Starovoitov
2022-03-19 21:43         ` Kumar Kartikeya Dwivedi
2022-03-20  0:57           ` Kumar Kartikeya Dwivedi
2022-03-17 11:59 ` [PATCH bpf-next v2 05/15] bpf: Allow storing percpu " Kumar Kartikeya Dwivedi
2022-03-19 18:30   ` Alexei Starovoitov
2022-03-19 19:04     ` Kumar Kartikeya Dwivedi
2022-03-19 21:26       ` Alexei Starovoitov
2022-03-19 21:45         ` Kumar Kartikeya Dwivedi
2022-03-19 23:01           ` Alexei Starovoitov
2022-03-17 11:59 ` [PATCH bpf-next v2 06/15] bpf: Allow storing user " Kumar Kartikeya Dwivedi
2022-03-19 18:28   ` Alexei Starovoitov
2022-03-19 19:02     ` Kumar Kartikeya Dwivedi
2022-03-17 11:59 ` [PATCH bpf-next v2 07/15] bpf: Prevent escaping of kptr loaded from maps Kumar Kartikeya Dwivedi
2022-03-17 11:59 ` [PATCH bpf-next v2 08/15] bpf: Adapt copy_map_value for multiple offset case Kumar Kartikeya Dwivedi
2022-03-19 18:34   ` Alexei Starovoitov
2022-03-19 19:06     ` Kumar Kartikeya Dwivedi
2022-03-17 11:59 ` [PATCH bpf-next v2 09/15] bpf: Always raise reference in btf_get_module_btf Kumar Kartikeya Dwivedi
2022-03-19 18:43   ` Alexei Starovoitov
2022-03-20 13:11   ` [bpf] 9a707eb02e: BUG:kernel_NULL_pointer_dereference,address kernel test robot
2022-03-20 13:11     ` kernel test robot
2022-03-17 11:59 ` [PATCH bpf-next v2 10/15] bpf: Populate pairs of btf_id and destructor kfunc in btf Kumar Kartikeya Dwivedi
2022-03-17 11:59 ` [PATCH bpf-next v2 11/15] bpf: Wire up freeing of referenced kptr Kumar Kartikeya Dwivedi
2022-03-17 11:59 ` [PATCH bpf-next v2 12/15] bpf: Teach verifier about kptr_get kfunc helpers Kumar Kartikeya Dwivedi
2022-03-17 11:59 ` [PATCH bpf-next v2 13/15] libbpf: Add kptr type tag macros to bpf_helpers.h Kumar Kartikeya Dwivedi
2022-03-17 11:59 ` [PATCH bpf-next v2 14/15] selftests/bpf: Add C tests for kptr Kumar Kartikeya Dwivedi
2022-03-17 11:59 ` [PATCH bpf-next v2 15/15] selftests/bpf: Add verifier " Kumar Kartikeya Dwivedi
2022-03-19 18:50 ` [PATCH bpf-next v2 00/15] Introduce typed pointer support in BPF maps patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220319225744.xlebrwxerrw3isds@ast-mbp.dhcp.thefacebook.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=memxor@gmail.com \
    --cc=toke@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.