All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v3 0/6] Dynamic pointers
@ 2022-04-28 21:10 Joanne Koong
  2022-04-28 21:10 ` [PATCH bpf-next v3 1/6] bpf: Add MEM_UNINIT as a bpf_type_flag Joanne Koong
                   ` (6 more replies)
  0 siblings, 7 replies; 29+ messages in thread
From: Joanne Koong @ 2022-04-28 21:10 UTC (permalink / raw)
  To: bpf; +Cc: andrii, memxor, ast, daniel, toke, Joanne Koong

This patchset implements the basics of dynamic pointers in bpf.

A dynamic pointer (struct bpf_dynptr) is a pointer that stores extra metadata
alongside the address it points to. This abstraction is useful in bpf, given
that every memory access in a bpf program must be safe. The verifier and bpf
helper functions can use the metadata to enforce safety guarantees for things 
such as dynamically sized strings and kernel heap allocations.

From the program side, the bpf_dynptr is an opaque struct and the verifier
will enforce that its contents are never written to by the program.
It can only be written to through specific bpf helper functions.

There are several uses cases for dynamic pointers in bpf programs. A list of
some are: dynamically sized ringbuf reservations without any extra memcpys,
dynamic string parsing and memory comparisons, dynamic memory allocations that
can be persisted in a map, and dynamic parsing of sk_buff and xdp_md packet
data.

At a high-level, the patches are as follows:
1/6 - Adds MEM_UNINIT as a bpf_type_flag
2/6 - Adds verifier support for dynptrs and implements malloc dynptrs
3/6 - Adds dynptr support for ring buffers
4/6 - Adds bpf_dynptr_read and bpf_dynptr_write
5/6 - Adds dynptr data slices (ptr to underlying dynptr memory)
6/6 - Tests to check that verifier rejects certain fail cases and passes
certain success cases

This is the first dynptr patchset in a larger series. The next series of
patches will add persisting dynamic memory allocations in maps, parsing packet
data through dynptrs, convenience helpers for using dynptrs as iterators, and
more helper functions for interacting with strings and memory dynamically.

Changelog:
----------
v3 -> v2:
v2: https://lore.kernel.org/bpf/20220416063429.3314021-1-joannelkoong@gmail.com/

* Reorder patches (move ringbuffer patch to be right after the verifier + malloc
dynptr patchset)
* Remove local type dynptrs (Andrii + Alexei)
* Mark stack slot as STACK_MISC after any writes into a dynptr instead of
* explicitly prohibiting writes (Alexei)
* Pass number of slots, not memory size to is_spi_bounds_valid (Kumar) 
* Check reference leaks by adding dynptr id to state->refs instead of checking
stack slots (Alexei)

v1 -> v2:
v1: https://lore.kernel.org/bpf/20220402015826.3941317-1-joannekoong@fb.com/

1/7 -
    * Remove ARG_PTR_TO_MAP_VALUE_UNINIT alias and use
      ARG_PTR_TO_MAP_VALUE | MEM_UNINIT directly (Andrii)
    * Drop arg_type_is_mem_ptr() wrapper function (Andrii)
2/7 - 
    * Change name from MEM_RELEASE to OBJ_RELEASE (Andrii)
    * Use meta.release_ref instead of ref_obj_id != 0 to determine whether
      to release reference (Kumar)
    * Drop type_is_release_mem() wrapper function (Andrii) 
3/7 -
    * Add checks for nested dynptrs edge-cases, which could lead to corrupt
    * writes of the dynptr stack variable.
    * Add u64 flags to bpf_dynptr_from_mem() and bpf_dynptr_alloc() (Andrii)
    * Rename from bpf_malloc/bpf_free to bpf_dynptr_alloc/bpf_dynptr_put
      (Alexei)
    * Support alloc flag __GFP_ZERO (Andrii) 
    * Reserve upper 8 bits in dynptr size and offset fields instead of
      reserving just the upper 4 bits (Andrii)
    * Allow dynptr zero-slices (Andrii) 
    * Use the highest bit for is_rdonly instead of the 28th bit (Andrii)
    * Rename check_* functions to is_* functions for better readability
      (Andrii)
    * Add comment for code that checks the spi bounds (Andrii)
4/7 -
    * Fix doc description for bpf_dynpt_read (Toke)
    * Move bpf_dynptr_check_off_len() from function patch 1 to here (Andrii)
5/7 -
    * When finding the id for the dynptr to associate the data slice with,
      look for dynptr arg instead of assuming it is BPF_REG_1.
6/7 -
    * Add __force when casting from unsigned long to void * (kernel test robot)
    * Expand on docs for ringbuf dynptr APIs (Andrii)
7/7 -
    * Use table approach for defining test programs and error messages (Andrii)
    * Print out full log if there’s an error (Andrii)
    * Use bpf_object__find_program_by_name() instead of specifying
      program name as a string (Andrii)
    * Add 6 extra cases: invalid_nested_dynptrs1, invalid_nested_dynptrs2,
      invalid_ref_mem1, invalid_ref_mem2, zero_slice_access,
      and test_alloc_zero_bytes
    * Add checking for edge cases (eg allocing with invalid flags)


Joanne Koong (6):
  bpf: Add MEM_UNINIT as a bpf_type_flag
  bpf: Add verifier support for dynptrs and implement malloc dynptrs
  bpf: Dynptr support for ring buffers
  bpf: Add bpf_dynptr_read and bpf_dynptr_write
  bpf: Add dynptr data slices
  bpf: Dynptr tests

 include/linux/bpf.h                           | 103 +++-
 include/linux/bpf_verifier.h                  |  21 +
 include/uapi/linux/bpf.h                      |  96 +++
 kernel/bpf/helpers.c                          | 169 +++++-
 kernel/bpf/ringbuf.c                          |  71 +++
 kernel/bpf/verifier.c                         | 329 +++++++++-
 scripts/bpf_doc.py                            |   2 +
 tools/include/uapi/linux/bpf.h                |  96 +++
 .../testing/selftests/bpf/prog_tests/dynptr.c | 132 ++++
 .../testing/selftests/bpf/progs/dynptr_fail.c | 574 ++++++++++++++++++
 .../selftests/bpf/progs/dynptr_success.c      | 218 +++++++
 11 files changed, 1774 insertions(+), 37 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/dynptr.c
 create mode 100644 tools/testing/selftests/bpf/progs/dynptr_fail.c
 create mode 100644 tools/testing/selftests/bpf/progs/dynptr_success.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH bpf-next v3 1/6] bpf: Add MEM_UNINIT as a bpf_type_flag
  2022-04-28 21:10 [PATCH bpf-next v3 0/6] Dynamic pointers Joanne Koong
@ 2022-04-28 21:10 ` Joanne Koong
  2022-05-06 15:07   ` David Vernet
  2022-05-06 22:41   ` Andrii Nakryiko
  2022-04-28 21:10 ` [PATCH bpf-next v3 2/6] bpf: Add verifier support for dynptrs and implement malloc dynptrs Joanne Koong
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 29+ messages in thread
From: Joanne Koong @ 2022-04-28 21:10 UTC (permalink / raw)
  To: bpf; +Cc: andrii, memxor, ast, daniel, toke, Joanne Koong

Instead of having uninitialized versions of arguments as separate
bpf_arg_types (eg ARG_PTR_TO_UNINIT_MEM as the uninitialized version
of ARG_PTR_TO_MEM), we can instead use MEM_UNINIT as a bpf_type_flag
modifier to denote that the argument is uninitialized.

Doing so cleans up some of the logic in the verifier. We no longer
need to do two checks against an argument type (eg "if
(base_type(arg_type) == ARG_PTR_TO_MEM || base_type(arg_type) ==
ARG_PTR_TO_UNINIT_MEM)"), since uninitialized and initialized
versions of the same argument type will now share the same base type.

In the near future, MEM_UNINIT will be used by dynptr helper functions
as well.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/linux/bpf.h   | 17 +++++++++--------
 kernel/bpf/helpers.c  |  4 ++--
 kernel/bpf/verifier.c | 26 ++++++++------------------
 3 files changed, 19 insertions(+), 28 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index be94833d390a..d0c167865504 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -389,7 +389,9 @@ enum bpf_type_flag {
 	 */
 	PTR_UNTRUSTED		= BIT(6 + BPF_BASE_TYPE_BITS),
 
-	__BPF_TYPE_LAST_FLAG	= PTR_UNTRUSTED,
+	MEM_UNINIT		= BIT(7 + BPF_BASE_TYPE_BITS),
+
+	__BPF_TYPE_LAST_FLAG	= MEM_UNINIT,
 };
 
 /* Max number of base types. */
@@ -408,16 +410,11 @@ enum bpf_arg_type {
 	ARG_CONST_MAP_PTR,	/* const argument used as pointer to bpf_map */
 	ARG_PTR_TO_MAP_KEY,	/* pointer to stack used as map key */
 	ARG_PTR_TO_MAP_VALUE,	/* pointer to stack used as map value */
-	ARG_PTR_TO_UNINIT_MAP_VALUE,	/* pointer to valid memory used to store a map value */
 
-	/* the following constraints used to prototype bpf_memcmp() and other
-	 * functions that access data on eBPF program stack
+	/* Used to prototype bpf_memcmp() and other functions that access data
+	 * on eBPF program stack
 	 */
 	ARG_PTR_TO_MEM,		/* pointer to valid memory (stack, packet, map value) */
-	ARG_PTR_TO_UNINIT_MEM,	/* pointer to memory does not need to be initialized,
-				 * helper function must fill all bytes or clear
-				 * them in error case.
-				 */
 
 	ARG_CONST_SIZE,		/* number of bytes accessed from memory */
 	ARG_CONST_SIZE_OR_ZERO,	/* number of bytes accessed from memory or 0 */
@@ -449,6 +446,10 @@ enum bpf_arg_type {
 	ARG_PTR_TO_ALLOC_MEM_OR_NULL	= PTR_MAYBE_NULL | ARG_PTR_TO_ALLOC_MEM,
 	ARG_PTR_TO_STACK_OR_NULL	= PTR_MAYBE_NULL | ARG_PTR_TO_STACK,
 	ARG_PTR_TO_BTF_ID_OR_NULL	= PTR_MAYBE_NULL | ARG_PTR_TO_BTF_ID,
+	/* pointer to memory does not need to be initialized, helper function must fill
+	 * all bytes or clear them in error case.
+	 */
+	ARG_PTR_TO_UNINIT_MEM		= MEM_UNINIT | ARG_PTR_TO_MEM,
 
 	/* This must be the last entry. Its purpose is to ensure the enum is
 	 * wide enough to hold the higher bits reserved for bpf_type_flag.
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 3e709fed5306..8a2398ac14c2 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -103,7 +103,7 @@ const struct bpf_func_proto bpf_map_pop_elem_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_CONST_MAP_PTR,
-	.arg2_type	= ARG_PTR_TO_UNINIT_MAP_VALUE,
+	.arg2_type	= ARG_PTR_TO_MAP_VALUE | MEM_UNINIT,
 };
 
 BPF_CALL_2(bpf_map_peek_elem, struct bpf_map *, map, void *, value)
@@ -116,7 +116,7 @@ const struct bpf_func_proto bpf_map_peek_elem_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_CONST_MAP_PTR,
-	.arg2_type	= ARG_PTR_TO_UNINIT_MAP_VALUE,
+	.arg2_type	= ARG_PTR_TO_MAP_VALUE | MEM_UNINIT,
 };
 
 const struct bpf_func_proto bpf_get_prandom_u32_proto = {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 813f6ee80419..4565684839f1 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5378,12 +5378,6 @@ static int process_kptr_func(struct bpf_verifier_env *env, int regno,
 	return 0;
 }
 
-static bool arg_type_is_mem_ptr(enum bpf_arg_type type)
-{
-	return base_type(type) == ARG_PTR_TO_MEM ||
-	       base_type(type) == ARG_PTR_TO_UNINIT_MEM;
-}
-
 static bool arg_type_is_mem_size(enum bpf_arg_type type)
 {
 	return type == ARG_CONST_SIZE ||
@@ -5523,7 +5517,6 @@ static const struct bpf_reg_types kptr_types = { .types = { PTR_TO_MAP_VALUE } }
 static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
 	[ARG_PTR_TO_MAP_KEY]		= &map_key_value_types,
 	[ARG_PTR_TO_MAP_VALUE]		= &map_key_value_types,
-	[ARG_PTR_TO_UNINIT_MAP_VALUE]	= &map_key_value_types,
 	[ARG_CONST_SIZE]		= &scalar_types,
 	[ARG_CONST_SIZE_OR_ZERO]	= &scalar_types,
 	[ARG_CONST_ALLOC_SIZE_OR_ZERO]	= &scalar_types,
@@ -5537,7 +5530,6 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
 	[ARG_PTR_TO_BTF_ID]		= &btf_ptr_types,
 	[ARG_PTR_TO_SPIN_LOCK]		= &spin_lock_types,
 	[ARG_PTR_TO_MEM]		= &mem_types,
-	[ARG_PTR_TO_UNINIT_MEM]		= &mem_types,
 	[ARG_PTR_TO_ALLOC_MEM]		= &alloc_mem_types,
 	[ARG_PTR_TO_INT]		= &int_ptr_types,
 	[ARG_PTR_TO_LONG]		= &int_ptr_types,
@@ -5711,8 +5703,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		return -EACCES;
 	}
 
-	if (base_type(arg_type) == ARG_PTR_TO_MAP_VALUE ||
-	    base_type(arg_type) == ARG_PTR_TO_UNINIT_MAP_VALUE) {
+	if (base_type(arg_type) == ARG_PTR_TO_MAP_VALUE) {
 		err = resolve_map_arg_type(env, meta, &arg_type);
 		if (err)
 			return err;
@@ -5798,8 +5789,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		err = check_helper_mem_access(env, regno,
 					      meta->map_ptr->key_size, false,
 					      NULL);
-	} else if (base_type(arg_type) == ARG_PTR_TO_MAP_VALUE ||
-		   base_type(arg_type) == ARG_PTR_TO_UNINIT_MAP_VALUE) {
+	} else if (base_type(arg_type) == ARG_PTR_TO_MAP_VALUE) {
 		if (type_may_be_null(arg_type) && register_is_null(reg))
 			return 0;
 
@@ -5811,7 +5801,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			verbose(env, "invalid map_ptr to access map->value\n");
 			return -EACCES;
 		}
-		meta->raw_mode = (arg_type == ARG_PTR_TO_UNINIT_MAP_VALUE);
+		meta->raw_mode = arg_type & MEM_UNINIT;
 		err = check_helper_mem_access(env, regno,
 					      meta->map_ptr->value_size, false,
 					      meta);
@@ -5838,11 +5828,11 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			return -EACCES;
 	} else if (arg_type == ARG_PTR_TO_FUNC) {
 		meta->subprogno = reg->subprogno;
-	} else if (arg_type_is_mem_ptr(arg_type)) {
+	} else if (base_type(arg_type) == ARG_PTR_TO_MEM) {
 		/* The access to this pointer is only checked when we hit the
 		 * next is_mem_size argument below.
 		 */
-		meta->raw_mode = (arg_type == ARG_PTR_TO_UNINIT_MEM);
+		meta->raw_mode = arg_type & MEM_UNINIT;
 	} else if (arg_type_is_mem_size(arg_type)) {
 		bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
 
@@ -6189,9 +6179,9 @@ static bool check_raw_mode_ok(const struct bpf_func_proto *fn)
 static bool check_args_pair_invalid(enum bpf_arg_type arg_curr,
 				    enum bpf_arg_type arg_next)
 {
-	return (arg_type_is_mem_ptr(arg_curr) &&
+	return (base_type(arg_curr) == ARG_PTR_TO_MEM &&
 	        !arg_type_is_mem_size(arg_next)) ||
-	       (!arg_type_is_mem_ptr(arg_curr) &&
+	       (base_type(arg_curr) != ARG_PTR_TO_MEM &&
 		arg_type_is_mem_size(arg_next));
 }
 
@@ -6203,7 +6193,7 @@ static bool check_arg_pair_ok(const struct bpf_func_proto *fn)
 	 * helper function specification.
 	 */
 	if (arg_type_is_mem_size(fn->arg1_type) ||
-	    arg_type_is_mem_ptr(fn->arg5_type)  ||
+	    base_type(fn->arg5_type) == ARG_PTR_TO_MEM ||
 	    check_args_pair_invalid(fn->arg1_type, fn->arg2_type) ||
 	    check_args_pair_invalid(fn->arg2_type, fn->arg3_type) ||
 	    check_args_pair_invalid(fn->arg3_type, fn->arg4_type) ||
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH bpf-next v3 2/6] bpf: Add verifier support for dynptrs and implement malloc dynptrs
  2022-04-28 21:10 [PATCH bpf-next v3 0/6] Dynamic pointers Joanne Koong
  2022-04-28 21:10 ` [PATCH bpf-next v3 1/6] bpf: Add MEM_UNINIT as a bpf_type_flag Joanne Koong
@ 2022-04-28 21:10 ` Joanne Koong
  2022-05-06 23:30   ` Andrii Nakryiko
  2022-04-28 21:10 ` [PATCH bpf-next v3 3/6] bpf: Dynptr support for ring buffers Joanne Koong
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Joanne Koong @ 2022-04-28 21:10 UTC (permalink / raw)
  To: bpf; +Cc: andrii, memxor, ast, daniel, toke, Joanne Koong

This patch adds the bulk of the verifier work for supporting dynamic
pointers (dynptrs) in bpf. This patch implements malloc-type dynptrs
through 2 new APIs (bpf_dynptr_alloc and bpf_dynptr_put) that can be
called by a bpf program. Malloc-type dynptrs are dynptrs that dynamically
allocate memory on behalf of the program.

A bpf_dynptr is opaque to the bpf program. It is a 16-byte structure
defined internally as:

struct bpf_dynptr_kern {
    void *data;
    u32 size;
    u32 offset;
} __aligned(8);

The upper 8 bits of *size* is reserved (it contains extra metadata about
read-only status and dynptr type); consequently, a dynptr only supports
memory less than 16 MB.

The 2 new APIs for malloc-type dynptrs are:

long bpf_dynptr_alloc(u32 size, u64 flags, struct bpf_dynptr *ptr);
void bpf_dynptr_put(struct bpf_dynptr *ptr);

Please note that there *must* be a corresponding bpf_dynptr_put for
every bpf_dynptr_alloc (even if the alloc fails). This is enforced
by the verifier.

In the verifier, dynptr state information will be tracked in stack
slots. When the program passes in an uninitialized dynptr
(ARG_PTR_TO_DYNPTR | MEM_UNINIT), the stack slots corresponding
to the frame pointer where the dynptr resides at are marked STACK_DYNPTR.

For helper functions that take in initialized dynptrs (eg
bpf_dynptr_read + bpf_dynptr_write which are added later in this
patchset), the verifier enforces that the dynptr has been initialized
properly by checking that their corresponding stack slots have been marked
as STACK_DYNPTR. Dynptr release functions (eg bpf_dynptr_put) will clear the
stack slots. The verifier enforces at program exit that there are no
referenced dynptrs that haven't been released.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/linux/bpf.h            |  60 ++++++++-
 include/linux/bpf_verifier.h   |  21 +++
 include/uapi/linux/bpf.h       |  30 +++++
 kernel/bpf/helpers.c           |  75 +++++++++++
 kernel/bpf/verifier.c          | 225 ++++++++++++++++++++++++++++++++-
 scripts/bpf_doc.py             |   2 +
 tools/include/uapi/linux/bpf.h |  30 +++++
 7 files changed, 440 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index d0c167865504..757440406962 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -391,7 +391,10 @@ enum bpf_type_flag {
 
 	MEM_UNINIT		= BIT(7 + BPF_BASE_TYPE_BITS),
 
-	__BPF_TYPE_LAST_FLAG	= MEM_UNINIT,
+	/* DYNPTR points to dynamically allocated memory. */
+	DYNPTR_TYPE_MALLOC	= BIT(8 + BPF_BASE_TYPE_BITS),
+
+	__BPF_TYPE_LAST_FLAG	= DYNPTR_TYPE_MALLOC,
 };
 
 /* Max number of base types. */
@@ -436,6 +439,7 @@ enum bpf_arg_type {
 	ARG_PTR_TO_CONST_STR,	/* pointer to a null terminated read-only string */
 	ARG_PTR_TO_TIMER,	/* pointer to bpf_timer */
 	ARG_PTR_TO_KPTR,	/* pointer to referenced kptr */
+	ARG_PTR_TO_DYNPTR,      /* pointer to bpf_dynptr. See bpf_type_flag for dynptr type */
 	__BPF_ARG_TYPE_MAX,
 
 	/* Extended arg_types. */
@@ -2347,4 +2351,58 @@ int bpf_bprintf_prepare(char *fmt, u32 fmt_size, const u64 *raw_args,
 			u32 **bin_buf, u32 num_args);
 void bpf_bprintf_cleanup(void);
 
+/* the implementation of the opaque uapi struct bpf_dynptr */
+struct bpf_dynptr_kern {
+	void *data;
+	/* Size represents the number of usable bytes in the dynptr.
+	 * If for example the offset is at 200 for a malloc dynptr with
+	 * allocation size 256, the number of usable bytes is 56.
+	 *
+	 * The upper 8 bits are reserved.
+	 * Bit 31 denotes whether the dynptr is read-only.
+	 * Bits 28-30 denote the dynptr type.
+	 */
+	u32 size;
+	u32 offset;
+} __aligned(8);
+
+enum bpf_dynptr_type {
+	BPF_DYNPTR_TYPE_INVALID,
+	/* Memory allocated dynamically by the kernel for the dynptr */
+	BPF_DYNPTR_TYPE_MALLOC,
+};
+
+/* Since the upper 8 bits of dynptr->size is reserved, the
+ * maximum supported size is 2^24 - 1.
+ */
+#define DYNPTR_MAX_SIZE	((1UL << 24) - 1)
+#define DYNPTR_SIZE_MASK	0xFFFFFF
+#define DYNPTR_TYPE_SHIFT	28
+#define DYNPTR_TYPE_MASK	0x7
+
+static inline enum bpf_dynptr_type bpf_dynptr_get_type(struct bpf_dynptr_kern *ptr)
+{
+	return (ptr->size >> DYNPTR_TYPE_SHIFT) & DYNPTR_TYPE_MASK;
+}
+
+static inline void bpf_dynptr_set_type(struct bpf_dynptr_kern *ptr, enum bpf_dynptr_type type)
+{
+	ptr->size |= type << DYNPTR_TYPE_SHIFT;
+}
+
+static inline u32 bpf_dynptr_get_size(struct bpf_dynptr_kern *ptr)
+{
+	return ptr->size & DYNPTR_SIZE_MASK;
+}
+
+static inline int bpf_dynptr_check_size(u32 size)
+{
+	return size > DYNPTR_MAX_SIZE ? -E2BIG : 0;
+}
+
+void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data, enum bpf_dynptr_type type,
+		     u32 offset, u32 size);
+
+void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
+
 #endif /* _LINUX_BPF_H */
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 1f1e7f2ea967..830a0e11ae97 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -72,6 +72,18 @@ struct bpf_reg_state {
 
 		u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */
 
+		/* For dynptr stack slots */
+		struct {
+			enum bpf_dynptr_type type;
+			/* A dynptr is 16 bytes so it takes up 2 stack slots.
+			 * We need to track which slot is the first slot
+			 * to protect against cases where the user may try to
+			 * pass in an address starting at the second slot of the
+			 * dynptr.
+			 */
+			bool first_slot;
+		} dynptr;
+
 		/* Max size from any of the above. */
 		struct {
 			unsigned long raw1;
@@ -88,6 +100,8 @@ struct bpf_reg_state {
 	 * for the purpose of tracking that it's freed.
 	 * For PTR_TO_SOCKET this is used to share which pointers retain the
 	 * same reference to the socket, to determine proper reference freeing.
+	 * For stack slots that are dynptrs, this is used to track references to
+	 * the dynptr to determine proper reference freeing.
 	 */
 	u32 id;
 	/* PTR_TO_SOCKET and PTR_TO_TCP_SOCK could be a ptr returned
@@ -174,9 +188,16 @@ enum bpf_stack_slot_type {
 	STACK_SPILL,      /* register spilled into stack */
 	STACK_MISC,	  /* BPF program wrote some data into this slot */
 	STACK_ZERO,	  /* BPF program wrote constant zero */
+	/* A dynptr is stored in this stack slot. The type of dynptr
+	 * is stored in bpf_stack_state->spilled_ptr.dynptr.type
+	 */
+	STACK_DYNPTR,
 };
 
 #define BPF_REG_SIZE 8	/* size of eBPF register in bytes */
+/* size of a struct bpf_dynptr in bytes */
+#define BPF_DYNPTR_SIZE sizeof(struct bpf_dynptr_kern)
+#define BPF_DYNPTR_NR_SLOTS (BPF_DYNPTR_SIZE / BPF_REG_SIZE)
 
 struct bpf_stack_state {
 	struct bpf_reg_state spilled_ptr;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 444fe6f1cf35..5a87ed654016 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5154,6 +5154,29 @@ union bpf_attr {
  *		if not NULL, is a reference which must be released using its
  *		corresponding release function, or moved into a BPF map before
  *		program exit.
+ *
+ * long bpf_dynptr_alloc(u32 size, u64 flags, struct bpf_dynptr *ptr)
+ *	Description
+ *		Allocate memory of *size* bytes.
+ *
+ *		Every call to bpf_dynptr_alloc must have a corresponding
+ *		bpf_dynptr_put, regardless of whether the bpf_dynptr_alloc
+ *		succeeded.
+ *
+ *		The maximum *size* supported is DYNPTR_MAX_SIZE.
+ *		Supported *flags* are __GFP_ZERO.
+ *	Return
+ *		0 on success, -ENOMEM if there is not enough memory for the
+ *		allocation, -E2BIG if the size exceeds DYNPTR_MAX_SIZE, -EINVAL
+ *		if the flags is not supported.
+ *
+ * void bpf_dynptr_put(struct bpf_dynptr *ptr)
+ *	Description
+ *		Free memory allocated by bpf_dynptr_alloc.
+ *
+ *		After this operation, *ptr* will be an invalidated dynptr.
+ *	Return
+ *		Void.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5351,6 +5374,8 @@ union bpf_attr {
 	FN(skb_set_tstamp),		\
 	FN(ima_file_hash),		\
 	FN(kptr_xchg),			\
+	FN(dynptr_alloc),		\
+	FN(dynptr_put),			\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
@@ -6498,6 +6523,11 @@ struct bpf_timer {
 	__u64 :64;
 } __attribute__((aligned(8)));
 
+struct bpf_dynptr {
+	__u64 :64;
+	__u64 :64;
+} __attribute__((aligned(8)));
+
 struct bpf_sysctl {
 	__u32	write;		/* Sysctl is being read (= 0) or written (= 1).
 				 * Allows 1,2,4-byte read, but no write.
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 8a2398ac14c2..a4272e9239ea 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1396,6 +1396,77 @@ const struct bpf_func_proto bpf_kptr_xchg_proto = {
 	.arg2_btf_id  = BPF_PTR_POISON,
 };
 
+void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data, enum bpf_dynptr_type type,
+		     u32 offset, u32 size)
+{
+	ptr->data = data;
+	ptr->offset = offset;
+	ptr->size = size;
+	bpf_dynptr_set_type(ptr, type);
+}
+
+void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr)
+{
+	memset(ptr, 0, sizeof(*ptr));
+}
+
+BPF_CALL_3(bpf_dynptr_alloc, u32, size, u64, flags, struct bpf_dynptr_kern *, ptr)
+{
+	gfp_t gfp_flags = GFP_ATOMIC;
+	void *data;
+	int err;
+
+	err = bpf_dynptr_check_size(size);
+	if (err)
+		goto error;
+
+	if (flags) {
+		if (flags == __GFP_ZERO) {
+			gfp_flags |= flags;
+		} else {
+			err = -EINVAL;
+			goto error;
+		}
+	}
+
+	data = kmalloc(size, gfp_flags);
+	if (!data) {
+		err = -ENOMEM;
+		goto error;
+	}
+
+	bpf_dynptr_init(ptr, data, BPF_DYNPTR_TYPE_MALLOC, 0, size);
+
+	return 0;
+
+error:
+	bpf_dynptr_set_null(ptr);
+	return err;
+}
+
+const struct bpf_func_proto bpf_dynptr_alloc_proto = {
+	.func		= bpf_dynptr_alloc,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_ANYTHING,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_MALLOC | MEM_UNINIT,
+};
+
+BPF_CALL_1(bpf_dynptr_put, struct bpf_dynptr_kern *, dynptr)
+{
+	kfree(dynptr->data);
+	bpf_dynptr_set_null(dynptr);
+	return 0;
+}
+
+const struct bpf_func_proto bpf_dynptr_put_proto = {
+	.func		= bpf_dynptr_put,
+	.gpl_only	= false,
+	.ret_type	= RET_VOID,
+	.arg1_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_MALLOC | OBJ_RELEASE,
+};
+
 const struct bpf_func_proto bpf_get_current_task_proto __weak;
 const struct bpf_func_proto bpf_get_current_task_btf_proto __weak;
 const struct bpf_func_proto bpf_probe_read_user_proto __weak;
@@ -1448,6 +1519,10 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		return &bpf_loop_proto;
 	case BPF_FUNC_strncmp:
 		return &bpf_strncmp_proto;
+	case BPF_FUNC_dynptr_alloc:
+		return &bpf_dynptr_alloc_proto;
+	case BPF_FUNC_dynptr_put:
+		return &bpf_dynptr_put_proto;
 	default:
 		break;
 	}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 4565684839f1..16b7ea54a7e0 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -187,6 +187,11 @@ struct bpf_verifier_stack_elem {
 					  POISON_POINTER_DELTA))
 #define BPF_MAP_PTR(X)		((struct bpf_map *)((X) & ~BPF_MAP_PTR_UNPRIV))
 
+/* forward declarations */
+static bool arg_type_is_mem_size(enum bpf_arg_type type);
+static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx);
+static int release_reference(struct bpf_verifier_env *env, int ref_obj_id);
+
 static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux)
 {
 	return BPF_MAP_PTR(aux->map_ptr_state) == BPF_MAP_PTR_POISON;
@@ -259,6 +264,7 @@ struct bpf_call_arg_meta {
 	u32 ret_btf_id;
 	u32 subprogno;
 	struct bpf_map_value_off_desc *kptr_off_desc;
+	u8 uninit_dynptr_regno;
 };
 
 struct btf *btf_vmlinux;
@@ -580,6 +586,7 @@ static char slot_type_char[] = {
 	[STACK_SPILL]	= 'r',
 	[STACK_MISC]	= 'm',
 	[STACK_ZERO]	= '0',
+	[STACK_DYNPTR]	= 'd',
 };
 
 static void print_liveness(struct bpf_verifier_env *env,
@@ -595,6 +602,25 @@ static void print_liveness(struct bpf_verifier_env *env,
 		verbose(env, "D");
 }
 
+static inline int get_spi(s32 off)
+{
+	return (-off - 1) / BPF_REG_SIZE;
+}
+
+static bool is_spi_bounds_valid(struct bpf_func_state *state, int spi, int nr_slots)
+{
+	int allocated_slots = state->allocated_stack / BPF_REG_SIZE;
+
+	/* We need to check that slots between [spi - nr_slots + 1, spi] are
+	 * within [0, allocated_stack).
+	 *
+	 * Please note that the spi grows downwards. For example, a dynptr
+	 * takes the size of two stack slots; the first slot will be at
+	 * spi and the second slot will be at spi - 1.
+	 */
+	return spi - nr_slots + 1 >= 0 && spi < allocated_slots;
+}
+
 static struct bpf_func_state *func(struct bpf_verifier_env *env,
 				   const struct bpf_reg_state *reg)
 {
@@ -646,6 +672,134 @@ static void mark_verifier_state_scratched(struct bpf_verifier_env *env)
 	env->scratched_stack_slots = ~0ULL;
 }
 
+#define DYNPTR_TYPE_FLAG_MASK		DYNPTR_TYPE_MALLOC
+
+static int arg_to_dynptr_type(enum bpf_arg_type arg_type)
+{
+	switch (arg_type & DYNPTR_TYPE_FLAG_MASK) {
+	case DYNPTR_TYPE_MALLOC:
+		return BPF_DYNPTR_TYPE_MALLOC;
+	default:
+		return BPF_DYNPTR_TYPE_INVALID;
+	}
+}
+
+static inline bool dynptr_type_refcounted(enum bpf_dynptr_type type)
+{
+	return type == BPF_DYNPTR_TYPE_MALLOC;
+}
+
+static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
+				   enum bpf_arg_type arg_type, int insn_idx)
+{
+	struct bpf_func_state *state = cur_func(env);
+	enum bpf_dynptr_type type;
+	int spi, id, i;
+
+	spi = get_spi(reg->off);
+
+	if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS))
+		return -EINVAL;
+
+	for (i = 0; i < BPF_REG_SIZE; i++) {
+		state->stack[spi].slot_type[i] = STACK_DYNPTR;
+		state->stack[spi - 1].slot_type[i] = STACK_DYNPTR;
+	}
+
+	type = arg_to_dynptr_type(arg_type);
+	if (type == BPF_DYNPTR_TYPE_INVALID)
+		return -EINVAL;
+
+	state->stack[spi].spilled_ptr.dynptr.type = type;
+	state->stack[spi - 1].spilled_ptr.dynptr.type = type;
+
+	state->stack[spi].spilled_ptr.dynptr.first_slot = true;
+
+	if (dynptr_type_refcounted(type)) {
+		/* The id is used to track proper releasing */
+		id = acquire_reference_state(env, insn_idx);
+		if (id < 0)
+			return id;
+
+		state->stack[spi].spilled_ptr.id = id;
+		state->stack[spi - 1].spilled_ptr.id = id;
+	}
+
+	return 0;
+}
+
+static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+{
+	struct bpf_func_state *state = func(env, reg);
+	int spi, i;
+
+	spi = get_spi(reg->off);
+
+	if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS))
+		return -EINVAL;
+
+	for (i = 0; i < BPF_REG_SIZE; i++) {
+		state->stack[spi].slot_type[i] = STACK_INVALID;
+		state->stack[spi - 1].slot_type[i] = STACK_INVALID;
+	}
+
+	/* Invalidate any slices associated with this dynptr */
+	if (dynptr_type_refcounted(state->stack[spi].spilled_ptr.dynptr.type)) {
+		release_reference(env, state->stack[spi].spilled_ptr.id);
+		state->stack[spi].spilled_ptr.id = 0;
+		state->stack[spi - 1].spilled_ptr.id = 0;
+	}
+
+	state->stack[spi].spilled_ptr.dynptr.type = 0;
+	state->stack[spi - 1].spilled_ptr.dynptr.type = 0;
+
+	state->stack[spi].spilled_ptr.dynptr.first_slot = false;
+
+	return 0;
+}
+
+static bool is_dynptr_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+{
+	struct bpf_func_state *state = func(env, reg);
+	int spi = get_spi(reg->off);
+	int i;
+
+	if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS))
+		return true;
+
+	for (i = 0; i < BPF_REG_SIZE; i++) {
+		if (state->stack[spi].slot_type[i] == STACK_DYNPTR ||
+		    state->stack[spi - 1].slot_type[i] == STACK_DYNPTR)
+			return false;
+	}
+
+	return true;
+}
+
+static bool is_dynptr_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
+				     enum bpf_arg_type arg_type)
+{
+	struct bpf_func_state *state = func(env, reg);
+	int spi = get_spi(reg->off);
+	int i;
+
+	if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS) ||
+	    !state->stack[spi].spilled_ptr.dynptr.first_slot)
+		return false;
+
+	for (i = 0; i < BPF_REG_SIZE; i++) {
+	    if (state->stack[spi].slot_type[i] != STACK_DYNPTR ||
+		state->stack[spi - 1].slot_type[i] != STACK_DYNPTR)
+		    return false;
+	}
+
+	/* ARG_PTR_TO_DYNPTR takes any type of dynptr */
+	if (arg_type == ARG_PTR_TO_DYNPTR)
+		return true;
+
+	return state->stack[spi].spilled_ptr.dynptr.type == arg_to_dynptr_type(arg_type);
+}
+
 /* The reg state of a pointer or a bounded scalar was saved when
  * it was spilled to the stack.
  */
@@ -5400,6 +5554,16 @@ static bool arg_type_is_release(enum bpf_arg_type type)
 	return type & OBJ_RELEASE;
 }
 
+static inline bool arg_type_is_dynptr(enum bpf_arg_type type)
+{
+	return base_type(type) == ARG_PTR_TO_DYNPTR;
+}
+
+static inline bool arg_type_is_dynptr_uninit(enum bpf_arg_type type)
+{
+	return arg_type_is_dynptr(type) && (type & MEM_UNINIT);
+}
+
 static int int_ptr_type_to_size(enum bpf_arg_type type)
 {
 	if (type == ARG_PTR_TO_INT)
@@ -5539,6 +5703,7 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
 	[ARG_PTR_TO_CONST_STR]		= &const_str_ptr_types,
 	[ARG_PTR_TO_TIMER]		= &timer_types,
 	[ARG_PTR_TO_KPTR]		= &kptr_types,
+	[ARG_PTR_TO_DYNPTR]		= &stack_ptr_types,
 };
 
 static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
@@ -5725,7 +5890,16 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 
 skip_type_check:
 	if (arg_type_is_release(arg_type)) {
-		if (!reg->ref_obj_id && !register_is_null(reg)) {
+		if (arg_type_is_dynptr(arg_type)) {
+			struct bpf_func_state *state = func(env, reg);
+			int spi = get_spi(reg->off);
+
+			if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS) ||
+			    !state->stack[spi].spilled_ptr.id) {
+				verbose(env, "arg %d is an unacquired reference\n", regno);
+				return -EINVAL;
+			}
+		} else if (!reg->ref_obj_id && !register_is_null(reg)) {
 			verbose(env, "R%d must be referenced when passed to release function\n",
 				regno);
 			return -EINVAL;
@@ -5837,6 +6011,35 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
 
 		err = check_mem_size_reg(env, reg, regno, zero_size_allowed, meta);
+	} else if (arg_type_is_dynptr(arg_type)) {
+		/* Can't pass in a dynptr at a weird offset */
+		if (reg->off % BPF_REG_SIZE) {
+			verbose(env, "cannot pass in non-zero dynptr offset\n");
+			return -EINVAL;
+		}
+
+		if (arg_type & MEM_UNINIT)  {
+			if (!is_dynptr_reg_valid_uninit(env, reg)) {
+				verbose(env, "Arg #%d dynptr has to be an uninitialized dynptr\n",
+					arg + BPF_REG_1);
+				return -EINVAL;
+			}
+
+			meta->uninit_dynptr_regno = arg + BPF_REG_1;
+		} else if (!is_dynptr_reg_valid_init(env, reg, arg_type)) {
+			const char *err_extra = "";
+
+			switch (arg_type & DYNPTR_TYPE_FLAG_MASK) {
+			case DYNPTR_TYPE_MALLOC:
+				err_extra = "malloc ";
+				break;
+			default:
+				break;
+			}
+			verbose(env, "Expected an initialized %sdynptr as arg #%d\n",
+				err_extra, arg + BPF_REG_1);
+			return -EINVAL;
+		}
 	} else if (arg_type_is_alloc_size(arg_type)) {
 		if (!tnum_is_const(reg->var_off)) {
 			verbose(env, "R%d is not a known constant'\n",
@@ -6965,9 +7168,27 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 
 	regs = cur_regs(env);
 
+	if (meta.uninit_dynptr_regno) {
+		/* we write BPF_DW bits (8 bytes) at a time */
+		for (i = 0; i < BPF_DYNPTR_SIZE; i += 8) {
+			err = check_mem_access(env, insn_idx, meta.uninit_dynptr_regno,
+					       i, BPF_DW, BPF_WRITE, -1, false);
+			if (err)
+				return err;
+		}
+
+		err = mark_stack_slots_dynptr(env, &regs[meta.uninit_dynptr_regno],
+					      fn->arg_type[meta.uninit_dynptr_regno - BPF_REG_1],
+					      insn_idx);
+		if (err)
+			return err;
+	}
+
 	if (meta.release_regno) {
 		err = -EINVAL;
-		if (meta.ref_obj_id)
+		if (arg_type_is_dynptr(fn->arg_type[meta.release_regno - BPF_REG_1]))
+			err = unmark_stack_slots_dynptr(env, &regs[meta.release_regno]);
+		else if (meta.ref_obj_id)
 			err = release_reference(env, meta.ref_obj_id);
 		/* meta.ref_obj_id can only be 0 if register that is meant to be
 		 * released is NULL, which must be > R0.
diff --git a/scripts/bpf_doc.py b/scripts/bpf_doc.py
index 096625242475..766dcbc73897 100755
--- a/scripts/bpf_doc.py
+++ b/scripts/bpf_doc.py
@@ -633,6 +633,7 @@ class PrinterHelpers(Printer):
             'struct socket',
             'struct file',
             'struct bpf_timer',
+            'struct bpf_dynptr',
     ]
     known_types = {
             '...',
@@ -682,6 +683,7 @@ class PrinterHelpers(Printer):
             'struct socket',
             'struct file',
             'struct bpf_timer',
+            'struct bpf_dynptr',
     }
     mapped_types = {
             'u8': '__u8',
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 444fe6f1cf35..5a87ed654016 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5154,6 +5154,29 @@ union bpf_attr {
  *		if not NULL, is a reference which must be released using its
  *		corresponding release function, or moved into a BPF map before
  *		program exit.
+ *
+ * long bpf_dynptr_alloc(u32 size, u64 flags, struct bpf_dynptr *ptr)
+ *	Description
+ *		Allocate memory of *size* bytes.
+ *
+ *		Every call to bpf_dynptr_alloc must have a corresponding
+ *		bpf_dynptr_put, regardless of whether the bpf_dynptr_alloc
+ *		succeeded.
+ *
+ *		The maximum *size* supported is DYNPTR_MAX_SIZE.
+ *		Supported *flags* are __GFP_ZERO.
+ *	Return
+ *		0 on success, -ENOMEM if there is not enough memory for the
+ *		allocation, -E2BIG if the size exceeds DYNPTR_MAX_SIZE, -EINVAL
+ *		if the flags is not supported.
+ *
+ * void bpf_dynptr_put(struct bpf_dynptr *ptr)
+ *	Description
+ *		Free memory allocated by bpf_dynptr_alloc.
+ *
+ *		After this operation, *ptr* will be an invalidated dynptr.
+ *	Return
+ *		Void.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5351,6 +5374,8 @@ union bpf_attr {
 	FN(skb_set_tstamp),		\
 	FN(ima_file_hash),		\
 	FN(kptr_xchg),			\
+	FN(dynptr_alloc),		\
+	FN(dynptr_put),			\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
@@ -6498,6 +6523,11 @@ struct bpf_timer {
 	__u64 :64;
 } __attribute__((aligned(8)));
 
+struct bpf_dynptr {
+	__u64 :64;
+	__u64 :64;
+} __attribute__((aligned(8)));
+
 struct bpf_sysctl {
 	__u32	write;		/* Sysctl is being read (= 0) or written (= 1).
 				 * Allows 1,2,4-byte read, but no write.
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH bpf-next v3 3/6] bpf: Dynptr support for ring buffers
  2022-04-28 21:10 [PATCH bpf-next v3 0/6] Dynamic pointers Joanne Koong
  2022-04-28 21:10 ` [PATCH bpf-next v3 1/6] bpf: Add MEM_UNINIT as a bpf_type_flag Joanne Koong
  2022-04-28 21:10 ` [PATCH bpf-next v3 2/6] bpf: Add verifier support for dynptrs and implement malloc dynptrs Joanne Koong
@ 2022-04-28 21:10 ` Joanne Koong
  2022-05-06 23:41   ` Andrii Nakryiko
  2022-04-28 21:10 ` [PATCH bpf-next v3 4/6] bpf: Add bpf_dynptr_read and bpf_dynptr_write Joanne Koong
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Joanne Koong @ 2022-04-28 21:10 UTC (permalink / raw)
  To: bpf; +Cc: andrii, memxor, ast, daniel, toke, Joanne Koong

Currently, our only way of writing dynamically-sized data into a ring
buffer is through bpf_ringbuf_output but this incurs an extra memcpy
cost. bpf_ringbuf_reserve + bpf_ringbuf_commit avoids this extra
memcpy, but it can only safely support reservation sizes that are
statically known since the verifier cannot guarantee that the bpf
program won’t access memory outside the reserved space.

The bpf_dynptr abstraction allows for dynamically-sized ring buffer
reservations without the extra memcpy.

There are 3 new APIs:

long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr);
void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags);
void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags);

These closely follow the functionalities of the original ringbuf APIs.
For example, all ringbuffer dynptrs that have been reserved must be
either submitted or discarded before the program exits.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/linux/bpf.h            | 10 ++++-
 include/uapi/linux/bpf.h       | 35 +++++++++++++++++
 kernel/bpf/helpers.c           |  6 +++
 kernel/bpf/ringbuf.c           | 71 ++++++++++++++++++++++++++++++++++
 kernel/bpf/verifier.c          | 18 +++++++--
 tools/include/uapi/linux/bpf.h | 35 +++++++++++++++++
 6 files changed, 171 insertions(+), 4 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 757440406962..10efbec99e93 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -394,7 +394,10 @@ enum bpf_type_flag {
 	/* DYNPTR points to dynamically allocated memory. */
 	DYNPTR_TYPE_MALLOC	= BIT(8 + BPF_BASE_TYPE_BITS),
 
-	__BPF_TYPE_LAST_FLAG	= DYNPTR_TYPE_MALLOC,
+	/* DYNPTR points to a ringbuf record. */
+	DYNPTR_TYPE_RINGBUF	= BIT(9 + BPF_BASE_TYPE_BITS),
+
+	__BPF_TYPE_LAST_FLAG	= DYNPTR_TYPE_RINGBUF,
 };
 
 /* Max number of base types. */
@@ -2203,6 +2206,9 @@ extern const struct bpf_func_proto bpf_ringbuf_reserve_proto;
 extern const struct bpf_func_proto bpf_ringbuf_submit_proto;
 extern const struct bpf_func_proto bpf_ringbuf_discard_proto;
 extern const struct bpf_func_proto bpf_ringbuf_query_proto;
+extern const struct bpf_func_proto bpf_ringbuf_reserve_dynptr_proto;
+extern const struct bpf_func_proto bpf_ringbuf_submit_dynptr_proto;
+extern const struct bpf_func_proto bpf_ringbuf_discard_dynptr_proto;
 extern const struct bpf_func_proto bpf_skc_to_tcp6_sock_proto;
 extern const struct bpf_func_proto bpf_skc_to_tcp_sock_proto;
 extern const struct bpf_func_proto bpf_skc_to_tcp_timewait_sock_proto;
@@ -2370,6 +2376,8 @@ enum bpf_dynptr_type {
 	BPF_DYNPTR_TYPE_INVALID,
 	/* Memory allocated dynamically by the kernel for the dynptr */
 	BPF_DYNPTR_TYPE_MALLOC,
+	/* Underlying data is a ringbuf record */
+	BPF_DYNPTR_TYPE_RINGBUF,
 };
 
 /* Since the upper 8 bits of dynptr->size is reserved, the
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 5a87ed654016..679f960d2514 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5177,6 +5177,38 @@ union bpf_attr {
  *		After this operation, *ptr* will be an invalidated dynptr.
  *	Return
  *		Void.
+ *
+ * long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr)
+ *	Description
+ *		Reserve *size* bytes of payload in a ring buffer *ringbuf*
+ *		through the dynptr interface. *flags* must be 0.
+ *
+ *		Please note that a corresponding bpf_ringbuf_submit_dynptr or
+ *		bpf_ringbuf_discard_dynptr must be called on *ptr*, even if the
+ *		reservation fails. This is enforced by the verifier.
+ *	Return
+ *		0 on success, or a negative error in case of failure.
+ *
+ * void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags)
+ *	Description
+ *		Submit reserved ring buffer sample, pointed to by *data*,
+ *		through the dynptr interface. This is a no-op if the dynptr is
+ *		invalid/null.
+ *
+ *		For more information on *flags*, please see
+ *		'bpf_ringbuf_submit'.
+ *	Return
+ *		Nothing. Always succeeds.
+ *
+ * void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags)
+ *	Description
+ *		Discard reserved ring buffer sample through the dynptr
+ *		interface. This is a no-op if the dynptr is invalid/null.
+ *
+ *		For more information on *flags*, please see
+ *		'bpf_ringbuf_discard'.
+ *	Return
+ *		Nothing. Always succeeds.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5376,6 +5408,9 @@ union bpf_attr {
 	FN(kptr_xchg),			\
 	FN(dynptr_alloc),		\
 	FN(dynptr_put),			\
+	FN(ringbuf_reserve_dynptr),	\
+	FN(ringbuf_submit_dynptr),	\
+	FN(ringbuf_discard_dynptr),	\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index a4272e9239ea..2d6f2e28b580 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1513,6 +1513,12 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		return &bpf_ringbuf_discard_proto;
 	case BPF_FUNC_ringbuf_query:
 		return &bpf_ringbuf_query_proto;
+	case BPF_FUNC_ringbuf_reserve_dynptr:
+		return &bpf_ringbuf_reserve_dynptr_proto;
+	case BPF_FUNC_ringbuf_submit_dynptr:
+		return &bpf_ringbuf_submit_dynptr_proto;
+	case BPF_FUNC_ringbuf_discard_dynptr:
+		return &bpf_ringbuf_discard_dynptr_proto;
 	case BPF_FUNC_for_each_map_elem:
 		return &bpf_for_each_map_elem_proto;
 	case BPF_FUNC_loop:
diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index 311264ab80c4..685bee459525 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -475,3 +475,74 @@ const struct bpf_func_proto bpf_ringbuf_query_proto = {
 	.arg1_type	= ARG_CONST_MAP_PTR,
 	.arg2_type	= ARG_ANYTHING,
 };
+
+BPF_CALL_4(bpf_ringbuf_reserve_dynptr, struct bpf_map *, map, u32, size, u64, flags,
+	   struct bpf_dynptr_kern *, ptr)
+{
+	void *sample;
+	int err;
+
+	err = bpf_dynptr_check_size(size);
+	if (err) {
+		bpf_dynptr_set_null(ptr);
+		return err;
+	}
+
+	sample = (void __force *)____bpf_ringbuf_reserve(map, size, flags);
+
+	if (!sample) {
+		bpf_dynptr_set_null(ptr);
+		return -EINVAL;
+	}
+
+	bpf_dynptr_init(ptr, sample, BPF_DYNPTR_TYPE_RINGBUF, 0, size);
+
+	return 0;
+}
+
+const struct bpf_func_proto bpf_ringbuf_reserve_dynptr_proto = {
+	.func		= bpf_ringbuf_reserve_dynptr,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_CONST_MAP_PTR,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_ANYTHING,
+	.arg4_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_RINGBUF | MEM_UNINIT,
+};
+
+BPF_CALL_2(bpf_ringbuf_submit_dynptr, struct bpf_dynptr_kern *, ptr, u64, flags)
+{
+	if (!ptr->data)
+		return 0;
+
+	____bpf_ringbuf_submit(ptr->data, flags);
+
+	bpf_dynptr_set_null(ptr);
+
+	return 0;
+}
+
+const struct bpf_func_proto bpf_ringbuf_submit_dynptr_proto = {
+	.func		= bpf_ringbuf_submit_dynptr,
+	.ret_type	= RET_VOID,
+	.arg1_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_RINGBUF | OBJ_RELEASE,
+	.arg2_type	= ARG_ANYTHING,
+};
+
+BPF_CALL_2(bpf_ringbuf_discard_dynptr, struct bpf_dynptr_kern *, ptr, u64, flags)
+{
+	if (!ptr->data)
+		return 0;
+
+	____bpf_ringbuf_discard(ptr->data, flags);
+
+	bpf_dynptr_set_null(ptr);
+
+	return 0;
+}
+
+const struct bpf_func_proto bpf_ringbuf_discard_dynptr_proto = {
+	.func		= bpf_ringbuf_discard_dynptr,
+	.ret_type	= RET_VOID,
+	.arg1_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_RINGBUF | OBJ_RELEASE,
+	.arg2_type	= ARG_ANYTHING,
+};
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 16b7ea54a7e0..1b2ec1049368 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -672,13 +672,15 @@ static void mark_verifier_state_scratched(struct bpf_verifier_env *env)
 	env->scratched_stack_slots = ~0ULL;
 }
 
-#define DYNPTR_TYPE_FLAG_MASK		DYNPTR_TYPE_MALLOC
+#define DYNPTR_TYPE_FLAG_MASK (DYNPTR_TYPE_MALLOC | DYNPTR_TYPE_RINGBUF)
 
 static int arg_to_dynptr_type(enum bpf_arg_type arg_type)
 {
 	switch (arg_type & DYNPTR_TYPE_FLAG_MASK) {
 	case DYNPTR_TYPE_MALLOC:
 		return BPF_DYNPTR_TYPE_MALLOC;
+	case DYNPTR_TYPE_RINGBUF:
+		return BPF_DYNPTR_TYPE_RINGBUF;
 	default:
 		return BPF_DYNPTR_TYPE_INVALID;
 	}
@@ -686,7 +688,7 @@ static int arg_to_dynptr_type(enum bpf_arg_type arg_type)
 
 static inline bool dynptr_type_refcounted(enum bpf_dynptr_type type)
 {
-	return type == BPF_DYNPTR_TYPE_MALLOC;
+	return type == BPF_DYNPTR_TYPE_MALLOC || type == BPF_DYNPTR_TYPE_RINGBUF;
 }
 
 static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
@@ -6033,9 +6035,13 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			case DYNPTR_TYPE_MALLOC:
 				err_extra = "malloc ";
 				break;
+			case DYNPTR_TYPE_RINGBUF:
+				err_extra = "ringbuf ";
+				break;
 			default:
 				break;
 			}
+
 			verbose(env, "Expected an initialized %sdynptr as arg #%d\n",
 				err_extra, arg + BPF_REG_1);
 			return -EINVAL;
@@ -6161,7 +6167,10 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 	case BPF_MAP_TYPE_RINGBUF:
 		if (func_id != BPF_FUNC_ringbuf_output &&
 		    func_id != BPF_FUNC_ringbuf_reserve &&
-		    func_id != BPF_FUNC_ringbuf_query)
+		    func_id != BPF_FUNC_ringbuf_query &&
+		    func_id != BPF_FUNC_ringbuf_reserve_dynptr &&
+		    func_id != BPF_FUNC_ringbuf_submit_dynptr &&
+		    func_id != BPF_FUNC_ringbuf_discard_dynptr)
 			goto error;
 		break;
 	case BPF_MAP_TYPE_STACK_TRACE:
@@ -6277,6 +6286,9 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 	case BPF_FUNC_ringbuf_output:
 	case BPF_FUNC_ringbuf_reserve:
 	case BPF_FUNC_ringbuf_query:
+	case BPF_FUNC_ringbuf_reserve_dynptr:
+	case BPF_FUNC_ringbuf_submit_dynptr:
+	case BPF_FUNC_ringbuf_discard_dynptr:
 		if (map->map_type != BPF_MAP_TYPE_RINGBUF)
 			goto error;
 		break;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 5a87ed654016..679f960d2514 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5177,6 +5177,38 @@ union bpf_attr {
  *		After this operation, *ptr* will be an invalidated dynptr.
  *	Return
  *		Void.
+ *
+ * long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr)
+ *	Description
+ *		Reserve *size* bytes of payload in a ring buffer *ringbuf*
+ *		through the dynptr interface. *flags* must be 0.
+ *
+ *		Please note that a corresponding bpf_ringbuf_submit_dynptr or
+ *		bpf_ringbuf_discard_dynptr must be called on *ptr*, even if the
+ *		reservation fails. This is enforced by the verifier.
+ *	Return
+ *		0 on success, or a negative error in case of failure.
+ *
+ * void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags)
+ *	Description
+ *		Submit reserved ring buffer sample, pointed to by *data*,
+ *		through the dynptr interface. This is a no-op if the dynptr is
+ *		invalid/null.
+ *
+ *		For more information on *flags*, please see
+ *		'bpf_ringbuf_submit'.
+ *	Return
+ *		Nothing. Always succeeds.
+ *
+ * void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags)
+ *	Description
+ *		Discard reserved ring buffer sample through the dynptr
+ *		interface. This is a no-op if the dynptr is invalid/null.
+ *
+ *		For more information on *flags*, please see
+ *		'bpf_ringbuf_discard'.
+ *	Return
+ *		Nothing. Always succeeds.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5376,6 +5408,9 @@ union bpf_attr {
 	FN(kptr_xchg),			\
 	FN(dynptr_alloc),		\
 	FN(dynptr_put),			\
+	FN(ringbuf_reserve_dynptr),	\
+	FN(ringbuf_submit_dynptr),	\
+	FN(ringbuf_discard_dynptr),	\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH bpf-next v3 4/6] bpf: Add bpf_dynptr_read and bpf_dynptr_write
  2022-04-28 21:10 [PATCH bpf-next v3 0/6] Dynamic pointers Joanne Koong
                   ` (2 preceding siblings ...)
  2022-04-28 21:10 ` [PATCH bpf-next v3 3/6] bpf: Dynptr support for ring buffers Joanne Koong
@ 2022-04-28 21:10 ` Joanne Koong
  2022-05-06 23:48   ` Andrii Nakryiko
  2022-04-28 21:10 ` [PATCH bpf-next v3 5/6] bpf: Add dynptr data slices Joanne Koong
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Joanne Koong @ 2022-04-28 21:10 UTC (permalink / raw)
  To: bpf; +Cc: andrii, memxor, ast, daniel, toke, Joanne Koong

This patch adds two helper functions, bpf_dynptr_read and
bpf_dynptr_write:

long bpf_dynptr_read(void *dst, u32 len, struct bpf_dynptr *src, u32 offset);

long bpf_dynptr_write(struct bpf_dynptr *dst, u32 offset, void *src, u32 len);

The dynptr passed into these functions must be valid dynptrs that have
been initialized.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/linux/bpf.h            | 16 ++++++++++
 include/uapi/linux/bpf.h       | 19 ++++++++++++
 kernel/bpf/helpers.c           | 56 ++++++++++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h | 19 ++++++++++++
 4 files changed, 110 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 10efbec99e93..b276dbf942dd 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2387,6 +2387,12 @@ enum bpf_dynptr_type {
 #define DYNPTR_SIZE_MASK	0xFFFFFF
 #define DYNPTR_TYPE_SHIFT	28
 #define DYNPTR_TYPE_MASK	0x7
+#define DYNPTR_RDONLY_BIT	BIT(31)
+
+static inline bool bpf_dynptr_is_rdonly(struct bpf_dynptr_kern *ptr)
+{
+	return ptr->size & DYNPTR_RDONLY_BIT;
+}
 
 static inline enum bpf_dynptr_type bpf_dynptr_get_type(struct bpf_dynptr_kern *ptr)
 {
@@ -2408,6 +2414,16 @@ static inline int bpf_dynptr_check_size(u32 size)
 	return size > DYNPTR_MAX_SIZE ? -E2BIG : 0;
 }
 
+static inline int bpf_dynptr_check_off_len(struct bpf_dynptr_kern *ptr, u32 offset, u32 len)
+{
+	u32 capacity = bpf_dynptr_get_size(ptr) - ptr->offset;
+
+	if (len > capacity || offset > capacity - len)
+		return -EINVAL;
+
+	return 0;
+}
+
 void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data, enum bpf_dynptr_type type,
 		     u32 offset, u32 size);
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 679f960d2514..2d539930b7b2 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5209,6 +5209,23 @@ union bpf_attr {
  *		'bpf_ringbuf_discard'.
  *	Return
  *		Nothing. Always succeeds.
+ *
+ * long bpf_dynptr_read(void *dst, u32 len, struct bpf_dynptr *src, u32 offset)
+ *	Description
+ *		Read *len* bytes from *src* into *dst*, starting from *offset*
+ *		into *src*.
+ *	Return
+ *		0 on success, -EINVAL if *offset* + *len* exceeds the length
+ *		of *src*'s data or if *src* is an invalid dynptr.
+ *
+ * long bpf_dynptr_write(struct bpf_dynptr *dst, u32 offset, void *src, u32 len)
+ *	Description
+ *		Write *len* bytes from *src* into *dst*, starting from *offset*
+ *		into *dst*.
+ *	Return
+ *		0 on success, -EINVAL if *offset* + *len* exceeds the length
+ *		of *dst*'s data or if *dst* is an invalid dynptr or if *dst*
+ *		is a read-only dynptr.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5411,6 +5428,8 @@ union bpf_attr {
 	FN(ringbuf_reserve_dynptr),	\
 	FN(ringbuf_submit_dynptr),	\
 	FN(ringbuf_discard_dynptr),	\
+	FN(dynptr_read),		\
+	FN(dynptr_write),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 2d6f2e28b580..7206b9e5322f 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1467,6 +1467,58 @@ const struct bpf_func_proto bpf_dynptr_put_proto = {
 	.arg1_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_MALLOC | OBJ_RELEASE,
 };
 
+BPF_CALL_4(bpf_dynptr_read, void *, dst, u32, len, struct bpf_dynptr_kern *, src, u32, offset)
+{
+	int err;
+
+	if (!src->data)
+		return -EINVAL;
+
+	err = bpf_dynptr_check_off_len(src, offset, len);
+	if (err)
+		return err;
+
+	memcpy(dst, src->data + src->offset + offset, len);
+
+	return 0;
+}
+
+const struct bpf_func_proto bpf_dynptr_read_proto = {
+	.func		= bpf_dynptr_read,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
+	.arg3_type	= ARG_PTR_TO_DYNPTR,
+	.arg4_type	= ARG_ANYTHING,
+};
+
+BPF_CALL_4(bpf_dynptr_write, struct bpf_dynptr_kern *, dst, u32, offset, void *, src, u32, len)
+{
+	int err;
+
+	if (!dst->data || bpf_dynptr_is_rdonly(dst))
+		return -EINVAL;
+
+	err = bpf_dynptr_check_off_len(dst, offset, len);
+	if (err)
+		return err;
+
+	memcpy(dst->data + dst->offset + offset, src, len);
+
+	return 0;
+}
+
+const struct bpf_func_proto bpf_dynptr_write_proto = {
+	.func		= bpf_dynptr_write,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_DYNPTR,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_PTR_TO_MEM | MEM_RDONLY,
+	.arg4_type	= ARG_CONST_SIZE_OR_ZERO,
+};
+
 const struct bpf_func_proto bpf_get_current_task_proto __weak;
 const struct bpf_func_proto bpf_get_current_task_btf_proto __weak;
 const struct bpf_func_proto bpf_probe_read_user_proto __weak;
@@ -1529,6 +1581,10 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		return &bpf_dynptr_alloc_proto;
 	case BPF_FUNC_dynptr_put:
 		return &bpf_dynptr_put_proto;
+	case BPF_FUNC_dynptr_read:
+		return &bpf_dynptr_read_proto;
+	case BPF_FUNC_dynptr_write:
+		return &bpf_dynptr_write_proto;
 	default:
 		break;
 	}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 679f960d2514..2d539930b7b2 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5209,6 +5209,23 @@ union bpf_attr {
  *		'bpf_ringbuf_discard'.
  *	Return
  *		Nothing. Always succeeds.
+ *
+ * long bpf_dynptr_read(void *dst, u32 len, struct bpf_dynptr *src, u32 offset)
+ *	Description
+ *		Read *len* bytes from *src* into *dst*, starting from *offset*
+ *		into *src*.
+ *	Return
+ *		0 on success, -EINVAL if *offset* + *len* exceeds the length
+ *		of *src*'s data or if *src* is an invalid dynptr.
+ *
+ * long bpf_dynptr_write(struct bpf_dynptr *dst, u32 offset, void *src, u32 len)
+ *	Description
+ *		Write *len* bytes from *src* into *dst*, starting from *offset*
+ *		into *dst*.
+ *	Return
+ *		0 on success, -EINVAL if *offset* + *len* exceeds the length
+ *		of *dst*'s data or if *dst* is an invalid dynptr or if *dst*
+ *		is a read-only dynptr.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5411,6 +5428,8 @@ union bpf_attr {
 	FN(ringbuf_reserve_dynptr),	\
 	FN(ringbuf_submit_dynptr),	\
 	FN(ringbuf_discard_dynptr),	\
+	FN(dynptr_read),		\
+	FN(dynptr_write),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH bpf-next v3 5/6] bpf: Add dynptr data slices
  2022-04-28 21:10 [PATCH bpf-next v3 0/6] Dynamic pointers Joanne Koong
                   ` (3 preceding siblings ...)
  2022-04-28 21:10 ` [PATCH bpf-next v3 4/6] bpf: Add bpf_dynptr_read and bpf_dynptr_write Joanne Koong
@ 2022-04-28 21:10 ` Joanne Koong
  2022-05-06 23:57   ` Andrii Nakryiko
  2022-04-28 21:10 ` [PATCH bpf-next v3 6/6] bpf: Dynptr tests Joanne Koong
  2022-05-06 22:35 ` [PATCH bpf-next v3 0/6] Dynamic pointers Andrii Nakryiko
  6 siblings, 1 reply; 29+ messages in thread
From: Joanne Koong @ 2022-04-28 21:10 UTC (permalink / raw)
  To: bpf; +Cc: andrii, memxor, ast, daniel, toke, Joanne Koong

This patch adds a new helper function

void *bpf_dynptr_data(struct bpf_dynptr *ptr, u32 offset, u32 len);

which returns a pointer to the underlying data of a dynptr. *len*
must be a statically known value. The bpf program may access the returned
data slice as a normal buffer (eg can do direct reads and writes), since
the verifier associates the length with the returned pointer, and
enforces that no out of bounds accesses occur.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/linux/bpf.h            |  4 +++
 include/uapi/linux/bpf.h       | 12 +++++++
 kernel/bpf/helpers.c           | 28 +++++++++++++++
 kernel/bpf/verifier.c          | 64 ++++++++++++++++++++++++++++++----
 tools/include/uapi/linux/bpf.h | 12 +++++++
 5 files changed, 114 insertions(+), 6 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index b276dbf942dd..4d2de868bdbc 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -397,6 +397,9 @@ enum bpf_type_flag {
 	/* DYNPTR points to a ringbuf record. */
 	DYNPTR_TYPE_RINGBUF	= BIT(9 + BPF_BASE_TYPE_BITS),
 
+	/* MEM is memory owned by a dynptr */
+	MEM_DYNPTR		= BIT(10 + BPF_BASE_TYPE_BITS),
+
 	__BPF_TYPE_LAST_FLAG	= DYNPTR_TYPE_RINGBUF,
 };
 
@@ -484,6 +487,7 @@ enum bpf_return_type {
 	RET_PTR_TO_TCP_SOCK_OR_NULL	= PTR_MAYBE_NULL | RET_PTR_TO_TCP_SOCK,
 	RET_PTR_TO_SOCK_COMMON_OR_NULL	= PTR_MAYBE_NULL | RET_PTR_TO_SOCK_COMMON,
 	RET_PTR_TO_ALLOC_MEM_OR_NULL	= PTR_MAYBE_NULL | MEM_ALLOC | RET_PTR_TO_ALLOC_MEM,
+	RET_PTR_TO_DYNPTR_MEM_OR_NULL	= PTR_MAYBE_NULL | MEM_DYNPTR | RET_PTR_TO_ALLOC_MEM,
 	RET_PTR_TO_BTF_ID_OR_NULL	= PTR_MAYBE_NULL | RET_PTR_TO_BTF_ID,
 
 	/* This must be the last entry. Its purpose is to ensure the enum is
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 2d539930b7b2..e3a7c85cc572 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5226,6 +5226,17 @@ union bpf_attr {
  *		0 on success, -EINVAL if *offset* + *len* exceeds the length
  *		of *dst*'s data or if *dst* is an invalid dynptr or if *dst*
  *		is a read-only dynptr.
+ *
+ * void *bpf_dynptr_data(struct bpf_dynptr *ptr, u32 offset, u32 len)
+ *	Description
+ *		Get a pointer to the underlying dynptr data.
+ *
+ *		*len* must be a statically known value. The returned data slice
+ *		is invalidated whenever the dynptr is invalidated.
+ *	Return
+ *		Pointer to the underlying dynptr data, NULL if the dynptr is
+ *		read-only, if the dynptr is invalid, or if the offset and length
+ *		is out of bounds.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5430,6 +5441,7 @@ union bpf_attr {
 	FN(ringbuf_discard_dynptr),	\
 	FN(dynptr_read),		\
 	FN(dynptr_write),		\
+	FN(dynptr_data),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 7206b9e5322f..065815b9fb9f 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1519,6 +1519,32 @@ const struct bpf_func_proto bpf_dynptr_write_proto = {
 	.arg4_type	= ARG_CONST_SIZE_OR_ZERO,
 };
 
+BPF_CALL_3(bpf_dynptr_data, struct bpf_dynptr_kern *, ptr, u32, offset, u32, len)
+{
+	int err;
+
+	if (!ptr->data)
+		return 0;
+
+	err = bpf_dynptr_check_off_len(ptr, offset, len);
+	if (err)
+		return 0;
+
+	if (bpf_dynptr_is_rdonly(ptr))
+		return 0;
+
+	return (unsigned long)(ptr->data + ptr->offset + offset);
+}
+
+const struct bpf_func_proto bpf_dynptr_data_proto = {
+	.func		= bpf_dynptr_data,
+	.gpl_only	= false,
+	.ret_type	= RET_PTR_TO_DYNPTR_MEM_OR_NULL,
+	.arg1_type	= ARG_PTR_TO_DYNPTR,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_CONST_ALLOC_SIZE_OR_ZERO,
+};
+
 const struct bpf_func_proto bpf_get_current_task_proto __weak;
 const struct bpf_func_proto bpf_get_current_task_btf_proto __weak;
 const struct bpf_func_proto bpf_probe_read_user_proto __weak;
@@ -1585,6 +1611,8 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		return &bpf_dynptr_read_proto;
 	case BPF_FUNC_dynptr_write:
 		return &bpf_dynptr_write_proto;
+	case BPF_FUNC_dynptr_data:
+		return &bpf_dynptr_data_proto;
 	default:
 		break;
 	}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 1b2ec1049368..3d5b35449113 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -485,7 +485,8 @@ static bool may_be_acquire_function(enum bpf_func_id func_id)
 		func_id == BPF_FUNC_sk_lookup_udp ||
 		func_id == BPF_FUNC_skc_lookup_tcp ||
 		func_id == BPF_FUNC_map_lookup_elem ||
-	        func_id == BPF_FUNC_ringbuf_reserve;
+		func_id == BPF_FUNC_ringbuf_reserve ||
+		func_id == BPF_FUNC_dynptr_data;
 }
 
 static bool is_acquire_function(enum bpf_func_id func_id,
@@ -497,7 +498,8 @@ static bool is_acquire_function(enum bpf_func_id func_id,
 	    func_id == BPF_FUNC_sk_lookup_udp ||
 	    func_id == BPF_FUNC_skc_lookup_tcp ||
 	    func_id == BPF_FUNC_ringbuf_reserve ||
-	    func_id == BPF_FUNC_kptr_xchg)
+	    func_id == BPF_FUNC_kptr_xchg ||
+	    func_id == BPF_FUNC_dynptr_data)
 		return true;
 
 	if (func_id == BPF_FUNC_map_lookup_elem &&
@@ -519,6 +521,11 @@ static bool is_ptr_cast_function(enum bpf_func_id func_id)
 		func_id == BPF_FUNC_skc_to_tcp_request_sock;
 }
 
+static inline bool is_dynptr_ref_function(enum bpf_func_id func_id)
+{
+	return func_id == BPF_FUNC_dynptr_data;
+}
+
 static bool is_cmpxchg_insn(const struct bpf_insn *insn)
 {
 	return BPF_CLASS(insn->code) == BPF_STX &&
@@ -569,6 +576,8 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
 		strncpy(prefix, "rdonly_", 32);
 	if (type & MEM_ALLOC)
 		strncpy(prefix, "alloc_", 32);
+	if (type & MEM_DYNPTR)
+		strncpy(prefix, "dynptr_", 32);
 	if (type & MEM_USER)
 		strncpy(prefix, "user_", 32);
 	if (type & MEM_PERCPU)
@@ -802,6 +811,20 @@ static bool is_dynptr_reg_valid_init(struct bpf_verifier_env *env, struct bpf_re
 	return state->stack[spi].spilled_ptr.dynptr.type == arg_to_dynptr_type(arg_type);
 }
 
+static bool is_ref_obj_id_dynptr(struct bpf_func_state *state, u32 id)
+{
+	int allocated_slots = state->allocated_stack / BPF_REG_SIZE;
+	int i;
+
+	for (i = 0; i < allocated_slots; i++) {
+		if (state->stack[i].slot_type[0] == STACK_DYNPTR &&
+		    state->stack[i].spilled_ptr.id == id)
+			return true;
+	}
+
+	return false;
+}
+
 /* The reg state of a pointer or a bounded scalar was saved when
  * it was spilled to the stack.
  */
@@ -5652,6 +5675,7 @@ static const struct bpf_reg_types mem_types = {
 		PTR_TO_MAP_VALUE,
 		PTR_TO_MEM,
 		PTR_TO_MEM | MEM_ALLOC,
+		PTR_TO_MEM | MEM_DYNPTR,
 		PTR_TO_BUF,
 	},
 };
@@ -5804,6 +5828,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
 	case PTR_TO_MEM:
 	case PTR_TO_MEM | MEM_RDONLY:
 	case PTR_TO_MEM | MEM_ALLOC:
+	case PTR_TO_MEM | MEM_DYNPTR:
 	case PTR_TO_BUF:
 	case PTR_TO_BUF | MEM_RDONLY:
 	case PTR_TO_STACK:
@@ -5838,6 +5863,14 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
 	return __check_ptr_off_reg(env, reg, regno, fixed_off_ok);
 }
 
+static inline u32 stack_slot_get_id(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+{
+	struct bpf_func_state *state = func(env, reg);
+	int spi = get_spi(reg->off);
+
+	return state->stack[spi].spilled_ptr.id;
+}
+
 static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			  struct bpf_call_arg_meta *meta,
 			  const struct bpf_func_proto *fn)
@@ -7370,10 +7403,28 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		/* For release_reference() */
 		regs[BPF_REG_0].ref_obj_id = meta.ref_obj_id;
 	} else if (is_acquire_function(func_id, meta.map_ptr)) {
-		int id = acquire_reference_state(env, insn_idx);
+		int id;
+
+		if (is_dynptr_ref_function(func_id)) {
+			int i;
+
+			/* Find the id of the dynptr we're acquiring a reference to */
+			for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
+				if (arg_type_is_dynptr(fn->arg_type[i])) {
+					id = stack_slot_get_id(env, &regs[BPF_REG_1 + i]);
+					break;
+				}
+			}
+			if (unlikely(i == MAX_BPF_FUNC_REG_ARGS)) {
+				verbose(env, "verifier internal error: no dynptr args to a dynptr ref function");
+				return -EFAULT;
+			}
+		} else {
+			id = acquire_reference_state(env, insn_idx);
+			if (id < 0)
+				return id;
+		}
 
-		if (id < 0)
-			return id;
 		/* For mark_ptr_or_null_reg() */
 		regs[BPF_REG_0].id = id;
 		/* For release_reference() */
@@ -9809,7 +9860,8 @@ static void mark_ptr_or_null_regs(struct bpf_verifier_state *vstate, u32 regno,
 	u32 id = regs[regno].id;
 	int i;
 
-	if (ref_obj_id && ref_obj_id == id && is_null)
+	if (ref_obj_id && ref_obj_id == id && is_null &&
+	    !is_ref_obj_id_dynptr(state, id))
 		/* regs[regno] is in the " == NULL" branch.
 		 * No one could have freed the reference state before
 		 * doing the NULL check.
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 2d539930b7b2..e3a7c85cc572 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5226,6 +5226,17 @@ union bpf_attr {
  *		0 on success, -EINVAL if *offset* + *len* exceeds the length
  *		of *dst*'s data or if *dst* is an invalid dynptr or if *dst*
  *		is a read-only dynptr.
+ *
+ * void *bpf_dynptr_data(struct bpf_dynptr *ptr, u32 offset, u32 len)
+ *	Description
+ *		Get a pointer to the underlying dynptr data.
+ *
+ *		*len* must be a statically known value. The returned data slice
+ *		is invalidated whenever the dynptr is invalidated.
+ *	Return
+ *		Pointer to the underlying dynptr data, NULL if the dynptr is
+ *		read-only, if the dynptr is invalid, or if the offset and length
+ *		is out of bounds.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5430,6 +5441,7 @@ union bpf_attr {
 	FN(ringbuf_discard_dynptr),	\
 	FN(dynptr_read),		\
 	FN(dynptr_write),		\
+	FN(dynptr_data),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH bpf-next v3 6/6] bpf: Dynptr tests
  2022-04-28 21:10 [PATCH bpf-next v3 0/6] Dynamic pointers Joanne Koong
                   ` (4 preceding siblings ...)
  2022-04-28 21:10 ` [PATCH bpf-next v3 5/6] bpf: Add dynptr data slices Joanne Koong
@ 2022-04-28 21:10 ` Joanne Koong
  2022-05-07  0:09   ` Andrii Nakryiko
  2022-05-06 22:35 ` [PATCH bpf-next v3 0/6] Dynamic pointers Andrii Nakryiko
  6 siblings, 1 reply; 29+ messages in thread
From: Joanne Koong @ 2022-04-28 21:10 UTC (permalink / raw)
  To: bpf; +Cc: andrii, memxor, ast, daniel, toke, Joanne Koong

This patch adds tests for dynptrs, which include cases that the
verifier needs to reject (for example, invalid bpf_dynptr_put usages
and  invalid writes/reads), as well as cases that should successfully
pass.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 .../testing/selftests/bpf/prog_tests/dynptr.c | 132 ++++
 .../testing/selftests/bpf/progs/dynptr_fail.c | 574 ++++++++++++++++++
 .../selftests/bpf/progs/dynptr_success.c      | 218 +++++++
 3 files changed, 924 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/dynptr.c
 create mode 100644 tools/testing/selftests/bpf/progs/dynptr_fail.c
 create mode 100644 tools/testing/selftests/bpf/progs/dynptr_success.c

diff --git a/tools/testing/selftests/bpf/prog_tests/dynptr.c b/tools/testing/selftests/bpf/prog_tests/dynptr.c
new file mode 100644
index 000000000000..0bed39fd8dac
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/dynptr.c
@@ -0,0 +1,132 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Facebook */
+
+#include <test_progs.h>
+#include "dynptr_fail.skel.h"
+#include "dynptr_success.skel.h"
+
+size_t log_buf_sz = 1048576; /* 1 MB */
+static char obj_log_buf[1048576];
+
+struct {
+	const char *prog_name;
+	const char *expected_err_msg;
+} dynptr_tests[] = {
+	/* failure cases */
+	{"missing_put", "Unreleased reference id=1"},
+	{"missing_put_callback", "Unreleased reference id=1"},
+	{"put_nonalloc", "Expected an initialized malloc dynptr as arg #1"},
+	{"put_data_slice", "type=dynptr_mem expected=fp"},
+	{"put_uninit_dynptr", "arg 1 is an unacquired reference"},
+	{"use_after_put", "Expected an initialized dynptr as arg #3"},
+	{"alloc_twice", "Arg #3 dynptr has to be an uninitialized dynptr"},
+	{"add_dynptr_to_map1", "invalid indirect read from stack"},
+	{"add_dynptr_to_map2", "invalid indirect read from stack"},
+	{"ringbuf_invalid_access", "invalid mem access 'scalar'"},
+	{"ringbuf_invalid_api", "type=dynptr_mem expected=alloc_mem"},
+	{"ringbuf_out_of_bounds", "value is outside of the allowed memory range"},
+	{"data_slice_out_of_bounds", "value is outside of the allowed memory range"},
+	{"data_slice_use_after_put", "invalid mem access 'scalar'"},
+	{"invalid_helper1", "invalid indirect read from stack"},
+	{"invalid_helper2", "Expected an initialized dynptr as arg #3"},
+	{"invalid_write1", "Expected an initialized malloc dynptr as arg #1"},
+	{"invalid_write2", "Expected an initialized dynptr as arg #3"},
+	{"invalid_write3", "Expected an initialized malloc dynptr as arg #1"},
+	{"invalid_write4", "arg 1 is an unacquired reference"},
+	{"invalid_read1", "invalid read from stack"},
+	{"invalid_read2", "cannot pass in non-zero dynptr offset"},
+	{"invalid_read3", "invalid read from stack"},
+	{"invalid_offset", "invalid write to stack"},
+	{"global", "R3 type=map_value expected=fp"},
+	{"put_twice", "arg 1 is an unacquired reference"},
+	{"put_twice_callback", "arg 1 is an unacquired reference"},
+	{"zero_slice_access", "invalid access to memory, mem_size=0 off=0 size=1"},
+	/* success cases */
+	{"test_basic", NULL},
+	{"test_data_slice", NULL},
+	{"test_ringbuf", NULL},
+	{"test_alloc_zero_bytes", NULL},
+};
+
+static void verify_fail(const char *prog_name, const char *expected_err_msg)
+{
+	LIBBPF_OPTS(bpf_object_open_opts, opts);
+	struct bpf_program *prog;
+	struct dynptr_fail *skel;
+	int err;
+
+	opts.kernel_log_buf = obj_log_buf;
+	opts.kernel_log_size = log_buf_sz;
+	opts.kernel_log_level = 1;
+
+	skel = dynptr_fail__open_opts(&opts);
+	if (!ASSERT_OK_PTR(skel, "dynptr_fail__open_opts"))
+		goto cleanup;
+
+	prog = bpf_object__find_program_by_name(skel->obj, prog_name);
+	if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+		goto cleanup;
+
+	bpf_program__set_autoload(prog, true);
+
+	err = dynptr_fail__load(skel);
+	if (!ASSERT_ERR(err, "unexpected load success"))
+		goto cleanup;
+
+	if (!ASSERT_OK_PTR(strstr(obj_log_buf, expected_err_msg), "expected_err_msg")) {
+		fprintf(stderr, "Expected err_msg: %s\n", expected_err_msg);
+		fprintf(stderr, "Verifier output: %s\n", obj_log_buf);
+	}
+
+cleanup:
+	dynptr_fail__destroy(skel);
+}
+
+static void verify_success(const char *prog_name)
+{
+	struct dynptr_success *skel;
+	struct bpf_program *prog;
+	struct bpf_link *link;
+
+	skel = dynptr_success__open();
+	if (!ASSERT_OK_PTR(skel, "dynptr_success__open"))
+		return;
+
+	skel->bss->pid = getpid();
+
+	dynptr_success__load(skel);
+	if (!ASSERT_OK_PTR(skel, "dynptr_success__load"))
+		goto cleanup;
+
+	prog = bpf_object__find_program_by_name(skel->obj, prog_name);
+	if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+		goto cleanup;
+
+	link = bpf_program__attach(prog);
+	if (!ASSERT_OK_PTR(link, "bpf_program__attach"))
+		goto cleanup;
+
+	usleep(1);
+
+	ASSERT_EQ(skel->bss->err, 0, "err");
+
+	bpf_link__destroy(link);
+
+cleanup:
+	dynptr_success__destroy(skel);
+}
+
+void test_dynptr(void)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(dynptr_tests); i++) {
+		if (!test__start_subtest(dynptr_tests[i].prog_name))
+			continue;
+
+		if (dynptr_tests[i].expected_err_msg)
+			verify_fail(dynptr_tests[i].prog_name, dynptr_tests[i].expected_err_msg);
+		else
+			verify_success(dynptr_tests[i].prog_name);
+	}
+}
diff --git a/tools/testing/selftests/bpf/progs/dynptr_fail.c b/tools/testing/selftests/bpf/progs/dynptr_fail.c
new file mode 100644
index 000000000000..e4d5464e1865
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/dynptr_fail.c
@@ -0,0 +1,574 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Facebook */
+
+#include <string.h>
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+char _license[] SEC("license") = "GPL";
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, __u32);
+	__type(value, struct bpf_dynptr);
+} array_map SEC(".maps");
+
+struct sample {
+	int pid;
+	long value;
+	char comm[16];
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_RINGBUF);
+	__uint(max_entries, 1 << 12);
+} ringbuf SEC(".maps");
+
+int err = 0;
+int val;
+
+/* Every bpf_dynptr_alloc call must have a corresponding bpf_dynptr_put call */
+SEC("?raw_tp/sys_nanosleep")
+int missing_put(void *ctx)
+{
+	struct bpf_dynptr mem;
+
+	bpf_dynptr_alloc(8, 0, &mem);
+
+	/* missing a call to bpf_dynptr_put(&mem) */
+
+	return 0;
+}
+
+/* A non-alloc-ed dynptr can't be used by bpf_dynptr_put */
+SEC("?raw_tp/sys_nanosleep")
+int put_nonalloc(void *ctx)
+{
+	struct bpf_dynptr ptr;
+	bpf_ringbuf_reserve_dynptr(&ringbuf, val, 0, &ptr);
+
+	/* this should fail */
+	bpf_dynptr_put(&ptr);
+
+	return 0;
+}
+
+/* A data slice from a dynptr can't be used by bpf_dynptr_put */
+SEC("?raw_tp/sys_nanosleep")
+int put_data_slice(void *ctx)
+{
+	struct bpf_dynptr ptr;
+	void *data;
+
+	bpf_dynptr_alloc(8, 0, &ptr);
+
+	data = bpf_dynptr_data(&ptr, 0, 8);
+	if (!data)
+		goto done;
+
+	/* this should fail */
+	bpf_dynptr_put(data);
+
+done:
+	bpf_dynptr_put(&ptr);
+	return 0;
+}
+
+/* Can't call bpf_dynptr_put on a non-initialized dynptr */
+SEC("?raw_tp/sys_nanosleep")
+int put_uninit_dynptr(void *ctx)
+{
+	struct bpf_dynptr ptr;
+
+	/* this should fail */
+	bpf_dynptr_put(&ptr);
+
+	return 0;
+}
+
+/* A dynptr can't be used after bpf_dynptr_put has been called on it */
+SEC("?raw_tp/sys_nanosleep")
+int use_after_put(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	char read_data[64] = {};
+
+	bpf_dynptr_alloc(8, 0, &ptr);
+
+	bpf_dynptr_read(read_data, sizeof(read_data), &ptr, 0);
+
+	bpf_dynptr_put(&ptr);
+
+	/* this should fail */
+	bpf_dynptr_read(read_data, sizeof(read_data), &ptr, 0);
+
+	return 0;
+}
+
+/*
+ * Can't bpf_dynptr_alloc an existing allocated bpf_dynptr that bpf_dynptr_put
+ * hasn't been called on yet
+ */
+SEC("?raw_tp/sys_nanosleep")
+int alloc_twice(void *ctx)
+{
+	struct bpf_dynptr ptr;
+
+	bpf_dynptr_alloc(8, 0, &ptr);
+
+	/* this should fail */
+	bpf_dynptr_alloc(2, 0, &ptr);
+
+	bpf_dynptr_put(&ptr);
+
+	return 0;
+}
+
+/*
+ * Can't access a ring buffer record after submit or discard has been called
+ * on the dynptr
+ */
+SEC("?raw_tp/sys_nanosleep")
+int ringbuf_invalid_access(void *ctx)
+{
+	struct bpf_dynptr ptr;
+	struct sample *sample;
+
+	err = bpf_ringbuf_reserve_dynptr(&ringbuf, sizeof(*sample), 0, &ptr);
+	sample = bpf_dynptr_data(&ptr, 0, sizeof(*sample));
+	if (!sample)
+		goto done;
+
+	sample->pid = 123;
+
+	bpf_ringbuf_submit_dynptr(&ptr, 0);
+
+	/* this should fail */
+	err = sample->pid;
+
+	return 0;
+
+done:
+	bpf_ringbuf_discard_dynptr(&ptr, 0);
+	return 0;
+}
+
+/* Can't call non-dynptr ringbuf APIs on a dynptr ringbuf sample */
+SEC("?raw_tp/sys_nanosleep")
+int ringbuf_invalid_api(void *ctx)
+{
+	struct bpf_dynptr ptr;
+	struct sample *sample;
+
+	err = bpf_ringbuf_reserve_dynptr(&ringbuf, sizeof(*sample), 0, &ptr);
+	sample = bpf_dynptr_data(&ptr, 0, sizeof(*sample));
+	if (!sample)
+		goto done;
+
+	sample->pid = 123;
+
+	/* invalid API use. need to use dynptr API to submit/discard */
+	bpf_ringbuf_submit(sample, 0);
+
+done:
+	bpf_ringbuf_discard_dynptr(&ptr, 0);
+	return 0;
+}
+
+/* Can't access memory outside a ringbuf record range */
+SEC("?raw_tp/sys_nanosleep")
+int ringbuf_out_of_bounds(void *ctx)
+{
+	struct bpf_dynptr ptr;
+	struct sample *sample;
+
+	err = bpf_ringbuf_reserve_dynptr(&ringbuf, sizeof(*sample), 0, &ptr);
+	sample = bpf_dynptr_data(&ptr, 0, sizeof(*sample));
+	if (!sample)
+		goto done;
+
+	/* Can't access beyond sample range */
+	*(__u8 *)((void *)sample + sizeof(*sample)) = 123;
+
+	bpf_ringbuf_submit_dynptr(&ptr, 0);
+
+	return 0;
+
+done:
+	bpf_ringbuf_discard_dynptr(&ptr, 0);
+	return 0;
+}
+
+/* Can't add a dynptr to a map */
+SEC("?raw_tp/sys_nanosleep")
+int add_dynptr_to_map1(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	int key = 0;
+
+	err = bpf_dynptr_alloc(sizeof(val), 0, &ptr);
+
+	/* this should fail */
+	bpf_map_update_elem(&array_map, &key, &ptr, 0);
+
+	bpf_dynptr_put(&ptr);
+
+	return 0;
+}
+
+/* Can't add a struct with an embedded dynptr to a map */
+SEC("?raw_tp/sys_nanosleep")
+int add_dynptr_to_map2(void *ctx)
+{
+	struct info {
+		int x;
+		struct bpf_dynptr ptr;
+	};
+	struct info x;
+	int key = 0;
+
+	bpf_dynptr_alloc(sizeof(val), 0, &x.ptr);
+
+	/* this should fail */
+	bpf_map_update_elem(&array_map, &key, &x, 0);
+
+	return 0;
+}
+
+/* Can't pass in a dynptr as an arg to a helper function that doesn't take in a
+ * dynptr argument
+ */
+SEC("?raw_tp/sys_nanosleep")
+int invalid_helper1(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+
+	bpf_dynptr_alloc(8, 0, &ptr);
+
+	/* this should fail */
+	bpf_strncmp((const char *)&ptr, sizeof(ptr), "hello!");
+
+	bpf_dynptr_put(&ptr);
+
+	return 0;
+}
+
+/* A dynptr can't be passed into a helper function at a non-zero offset */
+SEC("?raw_tp/sys_nanosleep")
+int invalid_helper2(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	char read_data[64] = {};
+
+	bpf_dynptr_alloc(sizeof(val), 0, &ptr);
+
+	/* this should fail */
+	bpf_dynptr_read(read_data, sizeof(read_data), (void *)&ptr + 8, 0);
+
+	bpf_dynptr_put(&ptr);
+
+	return 0;
+}
+
+/* A data slice can't be accessed out of bounds */
+SEC("?raw_tp/sys_nanosleep")
+int data_slice_out_of_bounds(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	void *data;
+
+	bpf_dynptr_alloc(8, 0, &ptr);
+
+	data = bpf_dynptr_data(&ptr, 0, 8);
+	if (!data)
+		goto done;
+
+	/* can't index out of bounds of the data slice */
+	val = *((char *)data + 8);
+
+done:
+	bpf_dynptr_put(&ptr);
+	return 0;
+}
+
+/* A data slice can't be used after bpf_dynptr_put is called */
+SEC("?raw_tp/sys_nanosleep")
+int data_slice_use_after_put(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	void *data;
+
+	bpf_dynptr_alloc(8, 0, &ptr);
+
+	data = bpf_dynptr_data(&ptr, 0, 8);
+	if (!data)
+		goto done;
+
+	bpf_dynptr_put(&ptr);
+
+	/* this should fail */
+	val = *(__u8 *)data;
+
+done:
+	bpf_dynptr_put(&ptr);
+	return 0;
+}
+
+/* A bpf_dynptr can't be used as a dynptr if it's been written into */
+SEC("?raw_tp/sys_nanosleep")
+int invalid_write1(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	__u8 x = 0;
+
+	bpf_dynptr_alloc(8, 0, &ptr);
+
+	memcpy(&ptr, &x, sizeof(x));
+
+	/* this should fail */
+	bpf_dynptr_put(&ptr);
+
+	return 0;
+}
+
+/*
+ * A bpf_dynptr can't be used as a dynptr if an offset into it has been
+ * written into
+ */
+SEC("?raw_tp/sys_nanosleep")
+int invalid_write2(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	char read_data[64] = {};
+	__u8 x = 0, y = 0;
+
+	bpf_dynptr_alloc(sizeof(x), 0, &ptr);
+
+	memcpy((void *)&ptr + 8, &y, sizeof(y));
+
+	/* this should fail */
+	bpf_dynptr_read(read_data, sizeof(read_data), &ptr, 0);
+
+	bpf_dynptr_put(&ptr);
+
+	return 0;
+}
+
+/*
+ * A bpf_dynptr can't be used as a dynptr if a non-const offset into it
+ * has been written into
+ */
+SEC("?raw_tp/sys_nanosleep")
+int invalid_write3(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	char stack_buf[16];
+	unsigned long len;
+	__u8 x = 0;
+
+	bpf_dynptr_alloc(8, 0, &ptr);
+
+	memcpy(stack_buf, &val, sizeof(val));
+	len = stack_buf[0] & 0xf;
+
+	memcpy((void *)&ptr + len, &x, sizeof(x));
+
+	/* this should fail */
+	bpf_dynptr_put(&ptr);
+
+	return 0;
+}
+
+static int invalid_write4_callback(__u32 index, void *data)
+{
+	*(__u32 *)data = 123;
+
+	return 0;
+}
+
+/* If the dynptr is written into in a callback function, it should
+ * be invalidated as a dynptr
+ */
+SEC("?raw_tp/sys_nanosleep")
+int invalid_write4(void *ctx)
+{
+	struct bpf_dynptr ptr;
+	__u64 x = 0;
+
+	bpf_dynptr_alloc(sizeof(x), 0, &ptr);
+
+	bpf_loop(10, invalid_write4_callback, &ptr, 0);
+
+	/* this should fail */
+	bpf_dynptr_put(&ptr);
+
+	return 0;
+}
+
+/* A globally-defined bpf_dynptr can't be used (it must reside as a stack frame) */
+struct bpf_dynptr global_dynptr;
+SEC("?raw_tp/sys_nanosleep")
+int global(void *ctx)
+{
+	/* this should fail */
+	bpf_dynptr_alloc(4, 0, &global_dynptr);
+
+	bpf_dynptr_put(&global_dynptr);
+
+	return 0;
+}
+
+/* A direct read should fail */
+SEC("?raw_tp/sys_nanosleep")
+int invalid_read1(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	__u32 x = 2;
+
+	bpf_dynptr_alloc(sizeof(x), 0, &ptr);
+
+	/* this should fail */
+	val = *(int *)&ptr;
+
+	bpf_dynptr_put(&ptr);
+
+	return 0;
+}
+
+/* A direct read at an offset should fail */
+SEC("?raw_tp/sys_nanosleep")
+int invalid_read2(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	char read_data[64] = {};
+	__u64 x = 0;
+
+	bpf_dynptr_alloc(sizeof(x), 0, &ptr);
+
+	/* this should fail */
+	bpf_dynptr_read(read_data, sizeof(read_data), (void *)&ptr + 1, 0);
+
+	bpf_dynptr_put(&ptr);
+
+	return 0;
+}
+
+/* A direct read at an offset into the lower stack slot should fail */
+SEC("?raw_tp/sys_nanosleep")
+int invalid_read3(void *ctx)
+{
+	struct bpf_dynptr ptr1 = {};
+	struct bpf_dynptr ptr2 = {};
+
+	bpf_dynptr_alloc(sizeof(val), 0, &ptr1);
+	bpf_dynptr_alloc(sizeof(val), 0, &ptr2);
+
+	/* this should fail */
+	memcpy(&val, (void *)&ptr1 + 8, sizeof(val));
+
+	bpf_dynptr_put(&ptr1);
+	bpf_dynptr_put(&ptr2);
+
+	return 0;
+}
+
+/* Calling bpf_dynptr_alloc on an offset should fail */
+SEC("?raw_tp/sys_nanosleep")
+int invalid_offset(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+
+	/* this should fail */
+	bpf_dynptr_alloc(sizeof(val), 0, &ptr + 1);
+
+	bpf_dynptr_put(&ptr);
+
+	return 0;
+}
+
+/* Can't call bpf_dynptr_put twice */
+SEC("?raw_tp/sys_nanosleep")
+int put_twice(void *ctx)
+{
+	struct bpf_dynptr ptr;
+
+	bpf_dynptr_alloc(8, 0, &ptr);
+
+	bpf_dynptr_put(&ptr);
+
+	/* this second put should fail */
+	bpf_dynptr_put(&ptr);
+
+	return 0;
+}
+
+static int put_twice_callback_fn(__u32 index, void *data)
+{
+	/* this should fail */
+	bpf_dynptr_put(data);
+	val = index;
+	return 0;
+}
+
+/* Test that calling bpf_dynptr_put twice, where the 2nd put happens within a
+ * calback function, fails
+ */
+SEC("?raw_tp/sys_nanosleep")
+int put_twice_callback(void *ctx)
+{
+	struct bpf_dynptr ptr;
+
+	bpf_dynptr_alloc(8, 0, &ptr);
+
+	bpf_dynptr_put(&ptr);
+
+	bpf_loop(10, put_twice_callback_fn, &ptr, 0);
+
+	return 0;
+}
+
+static int missing_put_callback_fn(__u32 index, void *data)
+{
+	struct bpf_dynptr ptr;
+
+	bpf_dynptr_alloc(8, 0, &ptr);
+
+	val = index;
+
+	/* missing bpf_dynptr_put(&ptr) */
+
+	return 0;
+}
+
+/* Any dynptr initialized within a callback must have bpf_dynptr_put called */
+SEC("?raw_tp/sys_nanosleep")
+int missing_put_callback(void *ctx)
+{
+	bpf_loop(10, missing_put_callback_fn, NULL, 0);
+	return 0;
+}
+
+/* Can't access memory in a zero-slice */
+SEC("?raw_tp/sys_nanosleep")
+int zero_slice_access(void *ctx)
+{
+	struct bpf_dynptr ptr;
+	void *data;
+
+	bpf_dynptr_alloc(0, 0, &ptr);
+
+	data = bpf_dynptr_data(&ptr, 0, 0);
+	if (!data)
+		goto done;
+
+	/* this should fail */
+	*(__u8 *)data = 23;
+
+	val = *(__u8 *)data;
+
+done:
+	bpf_dynptr_put(&ptr);
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/dynptr_success.c b/tools/testing/selftests/bpf/progs/dynptr_success.c
new file mode 100644
index 000000000000..eff6272623c2
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/dynptr_success.c
@@ -0,0 +1,218 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Facebook */
+
+#include <string.h>
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+#include "errno.h"
+
+char _license[] SEC("license") = "GPL";
+
+int pid = 0;
+int err = 0;
+int val;
+
+struct sample {
+	int pid;
+	int seq;
+	long value;
+	char comm[16];
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_RINGBUF);
+	__uint(max_entries, 1 << 12);
+} ringbuf SEC(".maps");
+
+SEC("tp/syscalls/sys_enter_nanosleep")
+int test_basic(void *ctx)
+{
+	char write_data[64] = "hello there, world!!";
+	char read_data[64] = {}, buf[64] = {};
+	struct bpf_dynptr ptr = {};
+	int i;
+
+	if (bpf_get_current_pid_tgid() >> 32 != pid)
+		return 0;
+
+	err = bpf_dynptr_alloc(sizeof(write_data), 0, &ptr);
+	if (err)
+		goto done;
+
+	/* Write data into the dynptr */
+	err = bpf_dynptr_write(&ptr, 0, write_data, sizeof(write_data));
+	if (err)
+		goto done;
+
+	/* Read the data that was written into the dynptr */
+	err = bpf_dynptr_read(read_data, sizeof(read_data), &ptr, 0);
+	if (err)
+		goto done;
+
+	/* Ensure the data we read matches the data we wrote */
+	for (i = 0; i < sizeof(read_data); i++) {
+		if (read_data[i] != write_data[i]) {
+			err = 1;
+			goto done;
+		}
+	}
+
+done:
+	bpf_dynptr_put(&ptr);
+	return 0;
+}
+
+SEC("tp/syscalls/sys_enter_nanosleep")
+int test_data_slice(void *ctx)
+{
+	struct bpf_dynptr ptr;
+	__u32 alloc_size = 16;
+	void *data;
+
+	if (bpf_get_current_pid_tgid() >> 32 != pid)
+		return 0;
+
+	/* test passing in an invalid flag */
+	err = bpf_dynptr_alloc(alloc_size, 1, &ptr);
+	if (err != -EINVAL) {
+		err = 1;
+		goto done;
+	}
+	bpf_dynptr_put(&ptr);
+
+	err = bpf_dynptr_alloc(alloc_size, 0, &ptr);
+	if (err)
+		goto done;
+
+	/* Try getting a data slice that is out of range */
+	data = bpf_dynptr_data(&ptr, alloc_size + 1, 1);
+	if (data) {
+		err = 2;
+		goto done;
+	}
+
+	/* Try getting more bytes than available */
+	data = bpf_dynptr_data(&ptr, 0, alloc_size + 1);
+	if (data) {
+		err = 3;
+		goto done;
+	}
+
+	data = bpf_dynptr_data(&ptr, 0, sizeof(int));
+	if (!data) {
+		err = 4;
+		goto done;
+	}
+
+	*(__u32 *)data = 999;
+
+	err = bpf_probe_read_kernel(&val, sizeof(val), data);
+	if (err)
+		goto done;
+
+	if (val != *(int *)data)
+		err = 5;
+
+done:
+	bpf_dynptr_put(&ptr);
+	return 0;
+}
+
+static int ringbuf_callback(__u32 index, void *data)
+{
+	struct sample *sample;
+
+	struct bpf_dynptr *ptr = (struct bpf_dynptr *)data;
+
+	sample = bpf_dynptr_data(ptr, 0, sizeof(*sample));
+	if (!sample) {
+		err = 2;
+		return 0;
+	}
+
+	sample->pid += val;
+
+	return 0;
+}
+
+SEC("tp/syscalls/sys_enter_nanosleep")
+int test_ringbuf(void *ctx)
+{
+	struct bpf_dynptr ptr;
+	struct sample *sample;
+
+	if (bpf_get_current_pid_tgid() >> 32 != pid)
+		return 0;
+
+	val = 100;
+
+	/* check that you can reserve a dynamic size reservation */
+	err = bpf_ringbuf_reserve_dynptr(&ringbuf, val, 0, &ptr);
+	if (err)
+		goto done;
+
+	sample = bpf_dynptr_data(&ptr, 0, sizeof(*sample));
+	if (!sample) {
+		err = 1;
+		goto done;
+	}
+
+	sample->pid = 123;
+
+	/* Can pass dynptr to callback functions */
+	bpf_loop(10, ringbuf_callback, &ptr, 0);
+
+	bpf_ringbuf_submit_dynptr(&ptr, 0);
+
+	return 0;
+
+done:
+	bpf_ringbuf_discard_dynptr(&ptr, 0);
+	return 0;
+}
+
+SEC("tp/syscalls/sys_enter_nanosleep")
+int test_alloc_zero_bytes(void *ctx)
+{
+	struct bpf_dynptr ptr;
+	void *data;
+	__u8 x = 0;
+
+	if (bpf_get_current_pid_tgid() >> 32 != pid)
+		return 0;
+
+	err = bpf_dynptr_alloc(0, 0, &ptr);
+	if (err)
+		goto done;
+
+	err = bpf_dynptr_write(&ptr, 0, &x, sizeof(x));
+	if (err != -EINVAL) {
+		err = 1;
+		goto done;
+	}
+
+	err = bpf_dynptr_read(&x, sizeof(x), &ptr, 0);
+	if (err != -EINVAL) {
+		err = 2;
+		goto done;
+	}
+	err = 0;
+
+	/* try to access memory we don't have access to */
+	data = bpf_dynptr_data(&ptr, 0, 1);
+	if (data) {
+		err = 3;
+		goto done;
+	}
+
+	data = bpf_dynptr_data(&ptr, 0, 0);
+	if (!data) {
+		err = 4;
+		goto done;
+	}
+
+done:
+	bpf_dynptr_put(&ptr);
+	return 0;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 1/6] bpf: Add MEM_UNINIT as a bpf_type_flag
  2022-04-28 21:10 ` [PATCH bpf-next v3 1/6] bpf: Add MEM_UNINIT as a bpf_type_flag Joanne Koong
@ 2022-05-06 15:07   ` David Vernet
       [not found]     ` <CAJnrk1Yc7G9BamfcNDGXvhMbHcrebROxN97GPPNENJ9_vGF5XA@mail.gmail.com>
  2022-05-06 22:41   ` Andrii Nakryiko
  1 sibling, 1 reply; 29+ messages in thread
From: David Vernet @ 2022-05-06 15:07 UTC (permalink / raw)
  To: Joanne Koong; +Cc: bpf, andrii, memxor, ast, daniel, toke

Hi Joanne,

On Thu, Apr 28, 2022 at 02:10:54PM -0700, Joanne Koong wrote:
> @@ -5523,7 +5517,6 @@ static const struct bpf_reg_types kptr_types = { .types = { PTR_TO_MAP_VALUE } }
>  static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
>  	[ARG_PTR_TO_MAP_KEY]		= &map_key_value_types,
>  	[ARG_PTR_TO_MAP_VALUE]		= &map_key_value_types,
> -	[ARG_PTR_TO_UNINIT_MAP_VALUE]	= &map_key_value_types,
>  	[ARG_CONST_SIZE]		= &scalar_types,
>  	[ARG_CONST_SIZE_OR_ZERO]	= &scalar_types,
>  	[ARG_CONST_ALLOC_SIZE_OR_ZERO]	= &scalar_types,
> @@ -5537,7 +5530,6 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
>  	[ARG_PTR_TO_BTF_ID]		= &btf_ptr_types,
>  	[ARG_PTR_TO_SPIN_LOCK]		= &spin_lock_types,
>  	[ARG_PTR_TO_MEM]		= &mem_types,
> -	[ARG_PTR_TO_UNINIT_MEM]		= &mem_types,
>  	[ARG_PTR_TO_ALLOC_MEM]		= &alloc_mem_types,
>  	[ARG_PTR_TO_INT]		= &int_ptr_types,
>  	[ARG_PTR_TO_LONG]		= &int_ptr_types,
> @@ -5711,8 +5703,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
>  		return -EACCES;
>  	}

Could (or should) this go in a separate patch? Even with removing
ARG_PTR_TO_UNINIT_MAP_VALUE, it seems like this could stand on its own. Not
sure what the norm is for how granular to split patches for bpf, so even if
it could go in its own patch, feel free to disregard if you think splitting
this off is excessive / unnecessary.

> -	} else if (base_type(arg_type) == ARG_PTR_TO_MAP_VALUE ||
> -		   base_type(arg_type) == ARG_PTR_TO_UNINIT_MAP_VALUE) {
> +	} else if (base_type(arg_type) == ARG_PTR_TO_MAP_VALUE) {
>  		if (type_may_be_null(arg_type) && register_is_null(reg))
>  			return 0;
>  
> @@ -5811,7 +5801,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
>  			verbose(env, "invalid map_ptr to access map->value\n");
>  			return -EACCES;
>  		}
> -		meta->raw_mode = (arg_type == ARG_PTR_TO_UNINIT_MAP_VALUE);
> +		meta->raw_mode = arg_type & MEM_UNINIT;

Given that we're stashing in a bool here, should this be:

	meta->raw_mode = (arg_type & MEM_UNINIT) != 0;

>  		err = check_helper_mem_access(env, regno,
>  					      meta->map_ptr->value_size, false,
>  					      meta);
> @@ -5838,11 +5828,11 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
>  			return -EACCES;
>  	} else if (arg_type == ARG_PTR_TO_FUNC) {
>  		meta->subprogno = reg->subprogno;
> -	} else if (arg_type_is_mem_ptr(arg_type)) {
> +	} else if (base_type(arg_type) == ARG_PTR_TO_MEM) {
>  		/* The access to this pointer is only checked when we hit the
>  		 * next is_mem_size argument below.
>  		 */
> -		meta->raw_mode = (arg_type == ARG_PTR_TO_UNINIT_MEM);
> +		meta->raw_mode = arg_type & MEM_UNINIT;

Same here.

>  	} else if (arg_type_is_mem_size(arg_type)) {
>  		bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
>  
> @@ -6189,9 +6179,9 @@ static bool check_raw_mode_ok(const struct bpf_func_proto *fn)
>  static bool check_args_pair_invalid(enum bpf_arg_type arg_curr,
>  				    enum bpf_arg_type arg_next)
>  {
> -	return (arg_type_is_mem_ptr(arg_curr) &&
> +	return (base_type(arg_curr) == ARG_PTR_TO_MEM &&
>  	        !arg_type_is_mem_size(arg_next)) ||
> -	       (!arg_type_is_mem_ptr(arg_curr) &&
> +	       (base_type(arg_curr) != ARG_PTR_TO_MEM &&
>  		arg_type_is_mem_size(arg_next));
>  }

What do you think about this as a possibly more concise way to express that
the curr and next args differ?

	return (base_type(arg_curr) == ARG_PTR_TO_MEM) !=
		arg_type_is_mem_size(arg_next);


Looks great overall!

Thanks,
David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 1/6] bpf: Add MEM_UNINIT as a bpf_type_flag
       [not found]     ` <CAJnrk1Yc7G9BamfcNDGXvhMbHcrebROxN97GPPNENJ9_vGF5XA@mail.gmail.com>
@ 2022-05-06 20:32       ` David Vernet
  2022-05-06 22:46         ` Andrii Nakryiko
  0 siblings, 1 reply; 29+ messages in thread
From: David Vernet @ 2022-05-06 20:32 UTC (permalink / raw)
  To: Joanne Koong
  Cc: bpf, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Fri, May 06, 2022 at 12:09:37PM -0700, Joanne Koong wrote:
> I think the bpf philosophy leans more towards conflating related-ish
> patches into the same patchset. I think this patch could be its own
> stand-alone patchset, but it's also related to the dynptr patchset in that
> dynptrs need it to properly describe its initialization helper functions.
> I'm happy to submit this as its own patchset though if that is preferred :)

Totally up to you, if that's the BPF convention then that's fine with me.

> 
> > -     } else if (base_type(arg_type) == ARG_PTR_TO_MAP_VALUE ||
> > > -                base_type(arg_type) == ARG_PTR_TO_UNINIT_MAP_VALUE) {
> > > +     } else if (base_type(arg_type) == ARG_PTR_TO_MAP_VALUE) {
> > >               if (type_may_be_null(arg_type) && register_is_null(reg))
> > >                       return 0;
> > >
> > > @@ -5811,7 +5801,7 @@ static int check_func_arg(struct bpf_verifier_env
> > *env, u32 arg,
> > >                       verbose(env, "invalid map_ptr to access
> > map->value\n");
> > >                       return -EACCES;
> > >               }
> > > -             meta->raw_mode = (arg_type == ARG_PTR_TO_UNINIT_MAP_VALUE);
> > > +             meta->raw_mode = arg_type & MEM_UNINIT;
> >
> > Given that we're stashing in a bool here, should this be:
> >
> >         meta->raw_mode = (arg_type & MEM_UNINIT) != 0;
> >
> I think just arg_type & MEM_UNINIT is okay because it implicitly converts
> from 1 -> true, 0 -> false. This is the convention that's used elsewhere in
> the linux codebase as well

Yeah I think functionally it will work just fine as is. I saw that a few
other places in verifier.c use operators that explicitly make the result 0
or 1, e.g.:

14699
14700         env->strict_alignment = !!(attr->prog_flags & BPF_F_STRICT_ALIGNMENT);

But the compiler will indeed implicitly convert any nonzero value to 1 if
it's stored in a bool, so it's not necessary for correctness. It looks like
the kernel style guide also implies that using the extra operators isn't
necessary, so I think we can leave it as you have it now:
https://www.kernel.org/doc/html/latest/process/coding-style.html#using-bool

> > What do you think about this as a possibly more concise way to express that
> > the curr and next args differ?
> >
> >         return (base_type(arg_curr) == ARG_PTR_TO_MEM) !=
> >                 arg_type_is_mem_size(arg_next);
> >
> I was trying to decide between this and the more verbose expression above
> and ultimately went with the more verbose expression because it seemed more
> readable to me. But I don't feel strongly :) I'm cool with either one

I don't feel strongly either, if you think your way is more readable then
don't feel obligated to change it.

Thanks,
David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 0/6] Dynamic pointers
  2022-04-28 21:10 [PATCH bpf-next v3 0/6] Dynamic pointers Joanne Koong
                   ` (5 preceding siblings ...)
  2022-04-28 21:10 ` [PATCH bpf-next v3 6/6] bpf: Dynptr tests Joanne Koong
@ 2022-05-06 22:35 ` Andrii Nakryiko
  2022-05-09 17:26   ` Joanne Koong
  6 siblings, 1 reply; 29+ messages in thread
From: Andrii Nakryiko @ 2022-05-06 22:35 UTC (permalink / raw)
  To: Joanne Koong
  Cc: bpf, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Thu, Apr 28, 2022 at 2:11 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> This patchset implements the basics of dynamic pointers in bpf.
>
> A dynamic pointer (struct bpf_dynptr) is a pointer that stores extra metadata
> alongside the address it points to. This abstraction is useful in bpf, given
> that every memory access in a bpf program must be safe. The verifier and bpf
> helper functions can use the metadata to enforce safety guarantees for things
> such as dynamically sized strings and kernel heap allocations.
>
> From the program side, the bpf_dynptr is an opaque struct and the verifier
> will enforce that its contents are never written to by the program.
> It can only be written to through specific bpf helper functions.
>
> There are several uses cases for dynamic pointers in bpf programs. A list of
> some are: dynamically sized ringbuf reservations without any extra memcpys,
> dynamic string parsing and memory comparisons, dynamic memory allocations that
> can be persisted in a map, and dynamic parsing of sk_buff and xdp_md packet
> data.
>
> At a high-level, the patches are as follows:
> 1/6 - Adds MEM_UNINIT as a bpf_type_flag
> 2/6 - Adds verifier support for dynptrs and implements malloc dynptrs
> 3/6 - Adds dynptr support for ring buffers
> 4/6 - Adds bpf_dynptr_read and bpf_dynptr_write
> 5/6 - Adds dynptr data slices (ptr to underlying dynptr memory)
> 6/6 - Tests to check that verifier rejects certain fail cases and passes
> certain success cases
>
> This is the first dynptr patchset in a larger series. The next series of
> patches will add persisting dynamic memory allocations in maps, parsing packet
> data through dynptrs, convenience helpers for using dynptrs as iterators, and
> more helper functions for interacting with strings and memory dynamically.
>
> Changelog:
> ----------
> v3 -> v2:
> v2: https://lore.kernel.org/bpf/20220416063429.3314021-1-joannelkoong@gmail.com/
>
> * Reorder patches (move ringbuffer patch to be right after the verifier + malloc
> dynptr patchset)
> * Remove local type dynptrs (Andrii + Alexei)
> * Mark stack slot as STACK_MISC after any writes into a dynptr instead of
> * explicitly prohibiting writes (Alexei)
> * Pass number of slots, not memory size to is_spi_bounds_valid (Kumar)
> * Check reference leaks by adding dynptr id to state->refs instead of checking
> stack slots (Alexei)
>

There seems to be test_progs-no_alu32 failure in CI ([0]), please take a look

  [0] https://github.com/kernel-patches/bpf/runs/6223168213?check_suite_focus=true#step:6:6430

> v1 -> v2:
> v1: https://lore.kernel.org/bpf/20220402015826.3941317-1-joannekoong@fb.com/
>
> 1/7 -
>     * Remove ARG_PTR_TO_MAP_VALUE_UNINIT alias and use
>       ARG_PTR_TO_MAP_VALUE | MEM_UNINIT directly (Andrii)
>     * Drop arg_type_is_mem_ptr() wrapper function (Andrii)
> 2/7 -
>     * Change name from MEM_RELEASE to OBJ_RELEASE (Andrii)
>     * Use meta.release_ref instead of ref_obj_id != 0 to determine whether
>       to release reference (Kumar)
>     * Drop type_is_release_mem() wrapper function (Andrii)
> 3/7 -
>     * Add checks for nested dynptrs edge-cases, which could lead to corrupt
>     * writes of the dynptr stack variable.
>     * Add u64 flags to bpf_dynptr_from_mem() and bpf_dynptr_alloc() (Andrii)
>     * Rename from bpf_malloc/bpf_free to bpf_dynptr_alloc/bpf_dynptr_put
>       (Alexei)
>     * Support alloc flag __GFP_ZERO (Andrii)
>     * Reserve upper 8 bits in dynptr size and offset fields instead of
>       reserving just the upper 4 bits (Andrii)
>     * Allow dynptr zero-slices (Andrii)
>     * Use the highest bit for is_rdonly instead of the 28th bit (Andrii)
>     * Rename check_* functions to is_* functions for better readability
>       (Andrii)
>     * Add comment for code that checks the spi bounds (Andrii)
> 4/7 -
>     * Fix doc description for bpf_dynpt_read (Toke)
>     * Move bpf_dynptr_check_off_len() from function patch 1 to here (Andrii)
> 5/7 -
>     * When finding the id for the dynptr to associate the data slice with,
>       look for dynptr arg instead of assuming it is BPF_REG_1.
> 6/7 -
>     * Add __force when casting from unsigned long to void * (kernel test robot)
>     * Expand on docs for ringbuf dynptr APIs (Andrii)
> 7/7 -
>     * Use table approach for defining test programs and error messages (Andrii)
>     * Print out full log if there’s an error (Andrii)
>     * Use bpf_object__find_program_by_name() instead of specifying
>       program name as a string (Andrii)
>     * Add 6 extra cases: invalid_nested_dynptrs1, invalid_nested_dynptrs2,
>       invalid_ref_mem1, invalid_ref_mem2, zero_slice_access,
>       and test_alloc_zero_bytes
>     * Add checking for edge cases (eg allocing with invalid flags)
>
>
> Joanne Koong (6):
>   bpf: Add MEM_UNINIT as a bpf_type_flag
>   bpf: Add verifier support for dynptrs and implement malloc dynptrs
>   bpf: Dynptr support for ring buffers
>   bpf: Add bpf_dynptr_read and bpf_dynptr_write
>   bpf: Add dynptr data slices
>   bpf: Dynptr tests
>
>  include/linux/bpf.h                           | 103 +++-
>  include/linux/bpf_verifier.h                  |  21 +
>  include/uapi/linux/bpf.h                      |  96 +++
>  kernel/bpf/helpers.c                          | 169 +++++-
>  kernel/bpf/ringbuf.c                          |  71 +++
>  kernel/bpf/verifier.c                         | 329 +++++++++-
>  scripts/bpf_doc.py                            |   2 +
>  tools/include/uapi/linux/bpf.h                |  96 +++
>  .../testing/selftests/bpf/prog_tests/dynptr.c | 132 ++++
>  .../testing/selftests/bpf/progs/dynptr_fail.c | 574 ++++++++++++++++++
>  .../selftests/bpf/progs/dynptr_success.c      | 218 +++++++
>  11 files changed, 1774 insertions(+), 37 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/dynptr.c
>  create mode 100644 tools/testing/selftests/bpf/progs/dynptr_fail.c
>  create mode 100644 tools/testing/selftests/bpf/progs/dynptr_success.c
>
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 1/6] bpf: Add MEM_UNINIT as a bpf_type_flag
  2022-04-28 21:10 ` [PATCH bpf-next v3 1/6] bpf: Add MEM_UNINIT as a bpf_type_flag Joanne Koong
  2022-05-06 15:07   ` David Vernet
@ 2022-05-06 22:41   ` Andrii Nakryiko
  1 sibling, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2022-05-06 22:41 UTC (permalink / raw)
  To: Joanne Koong
  Cc: bpf, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Thu, Apr 28, 2022 at 2:12 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> Instead of having uninitialized versions of arguments as separate
> bpf_arg_types (eg ARG_PTR_TO_UNINIT_MEM as the uninitialized version
> of ARG_PTR_TO_MEM), we can instead use MEM_UNINIT as a bpf_type_flag
> modifier to denote that the argument is uninitialized.
>
> Doing so cleans up some of the logic in the verifier. We no longer
> need to do two checks against an argument type (eg "if
> (base_type(arg_type) == ARG_PTR_TO_MEM || base_type(arg_type) ==
> ARG_PTR_TO_UNINIT_MEM)"), since uninitialized and initialized
> versions of the same argument type will now share the same base type.
>
> In the near future, MEM_UNINIT will be used by dynptr helper functions
> as well.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---

LGTM, see minor suggestion below

Acked-by: Andrii Nakryiko <andrii@kernel.org>

>  include/linux/bpf.h   | 17 +++++++++--------
>  kernel/bpf/helpers.c  |  4 ++--
>  kernel/bpf/verifier.c | 26 ++++++++------------------
>  3 files changed, 19 insertions(+), 28 deletions(-)
>

[...]

> @@ -6189,9 +6179,9 @@ static bool check_raw_mode_ok(const struct bpf_func_proto *fn)
>  static bool check_args_pair_invalid(enum bpf_arg_type arg_curr,
>                                     enum bpf_arg_type arg_next)
>  {
> -       return (arg_type_is_mem_ptr(arg_curr) &&
> +       return (base_type(arg_curr) == ARG_PTR_TO_MEM &&
>                 !arg_type_is_mem_size(arg_next)) ||
> -              (!arg_type_is_mem_ptr(arg_curr) &&
> +              (base_type(arg_curr) != ARG_PTR_TO_MEM &&
>                 arg_type_is_mem_size(arg_next));

trying to parse this check I realized that it can be written as !=
(basically it's a XOR, both conditions are either true or both are
false)

return (base_type(arg_curr) == ARG_PTR_TO_MEM) !=
arg_type_is_mem_size(arg_next);


>  }
>
> @@ -6203,7 +6193,7 @@ static bool check_arg_pair_ok(const struct bpf_func_proto *fn)
>          * helper function specification.
>          */
>         if (arg_type_is_mem_size(fn->arg1_type) ||
> -           arg_type_is_mem_ptr(fn->arg5_type)  ||
> +           base_type(fn->arg5_type) == ARG_PTR_TO_MEM ||
>             check_args_pair_invalid(fn->arg1_type, fn->arg2_type) ||
>             check_args_pair_invalid(fn->arg2_type, fn->arg3_type) ||
>             check_args_pair_invalid(fn->arg3_type, fn->arg4_type) ||
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 1/6] bpf: Add MEM_UNINIT as a bpf_type_flag
  2022-05-06 20:32       ` David Vernet
@ 2022-05-06 22:46         ` Andrii Nakryiko
  2022-05-07  1:48           ` David Vernet
  2022-05-09 17:10           ` Joanne Koong
  0 siblings, 2 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2022-05-06 22:46 UTC (permalink / raw)
  To: void
  Cc: Joanne Koong, bpf, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Fri, May 6, 2022 at 1:32 PM David Vernet <void@manifault.com> wrote:
>
> On Fri, May 06, 2022 at 12:09:37PM -0700, Joanne Koong wrote:
> > I think the bpf philosophy leans more towards conflating related-ish
> > patches into the same patchset. I think this patch could be its own
> > stand-alone patchset, but it's also related to the dynptr patchset in that
> > dynptrs need it to properly describe its initialization helper functions.
> > I'm happy to submit this as its own patchset though if that is preferred :)
>
> Totally up to you, if that's the BPF convention then that's fine with me.

You meant

- [ARG_PTR_TO_UNINIT_MEM]         = &mem_types,

parts as stand-alone patch? That would be invalid on its own without
adding MEM_UNINT, so would potentially break bisection. So no, it
shouldn't be a stand-alone patch. Each patch has to be logically
separate but not causing any regressions in behavior, compilation,
selftest, etc. So, for example, while we normally put selftests into
separate tests, if kernel change breaks selftests, selftests have to
be fixed in the same patch to avoid having any point where bisection
can detect the breakage.


>
> >
> > > -     } else if (base_type(arg_type) == ARG_PTR_TO_MAP_VALUE ||
> > > > -                base_type(arg_type) == ARG_PTR_TO_UNINIT_MAP_VALUE) {
> > > > +     } else if (base_type(arg_type) == ARG_PTR_TO_MAP_VALUE) {
> > > >               if (type_may_be_null(arg_type) && register_is_null(reg))
> > > >                       return 0;
> > > >
> > > > @@ -5811,7 +5801,7 @@ static int check_func_arg(struct bpf_verifier_env
> > > *env, u32 arg,
> > > >                       verbose(env, "invalid map_ptr to access
> > > map->value\n");
> > > >                       return -EACCES;
> > > >               }
> > > > -             meta->raw_mode = (arg_type == ARG_PTR_TO_UNINIT_MAP_VALUE);
> > > > +             meta->raw_mode = arg_type & MEM_UNINIT;
> > >
> > > Given that we're stashing in a bool here, should this be:
> > >
> > >         meta->raw_mode = (arg_type & MEM_UNINIT) != 0;
> > >
> > I think just arg_type & MEM_UNINIT is okay because it implicitly converts
> > from 1 -> true, 0 -> false. This is the convention that's used elsewhere in
> > the linux codebase as well
>
> Yeah I think functionally it will work just fine as is. I saw that a few
> other places in verifier.c use operators that explicitly make the result 0
> or 1, e.g.:
>
> 14699
> 14700         env->strict_alignment = !!(attr->prog_flags & BPF_F_STRICT_ALIGNMENT);
>
> But the compiler will indeed implicitly convert any nonzero value to 1 if
> it's stored in a bool, so it's not necessary for correctness. It looks like
> the kernel style guide also implies that using the extra operators isn't
> necessary, so I think we can leave it as you have it now:
> https://www.kernel.org/doc/html/latest/process/coding-style.html#using-bool

Yeah, the above example is rather unusual, I'd say. We do
!!(bool_expr) only when we want to assign that to integer (not bool)
variable/field as 0 or 1. Otherwise it's a well-defined compiler
conversion rule for any non-zero value to be true during bool
conversion.

>
> > > What do you think about this as a possibly more concise way to express that
> > > the curr and next args differ?
> > >
> > >         return (base_type(arg_curr) == ARG_PTR_TO_MEM) !=
> > >                 arg_type_is_mem_size(arg_next);
> > >
> > I was trying to decide between this and the more verbose expression above
> > and ultimately went with the more verbose expression because it seemed more
> > readable to me. But I don't feel strongly :) I'm cool with either one
>
> I don't feel strongly either, if you think your way is more readable then
> don't feel obligated to change it.
>

Heh, this also caught my eye. It's subjective, but inequality is
shorter and more readable (even in terms of the logic it expresses).
But it's fine either way with me.

> Thanks,
> David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 2/6] bpf: Add verifier support for dynptrs and implement malloc dynptrs
  2022-04-28 21:10 ` [PATCH bpf-next v3 2/6] bpf: Add verifier support for dynptrs and implement malloc dynptrs Joanne Koong
@ 2022-05-06 23:30   ` Andrii Nakryiko
  2022-05-09 18:58     ` Joanne Koong
  0 siblings, 1 reply; 29+ messages in thread
From: Andrii Nakryiko @ 2022-05-06 23:30 UTC (permalink / raw)
  To: Joanne Koong
  Cc: bpf, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Thu, Apr 28, 2022 at 2:12 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> This patch adds the bulk of the verifier work for supporting dynamic
> pointers (dynptrs) in bpf. This patch implements malloc-type dynptrs
> through 2 new APIs (bpf_dynptr_alloc and bpf_dynptr_put) that can be
> called by a bpf program. Malloc-type dynptrs are dynptrs that dynamically
> allocate memory on behalf of the program.
>
> A bpf_dynptr is opaque to the bpf program. It is a 16-byte structure
> defined internally as:
>
> struct bpf_dynptr_kern {
>     void *data;
>     u32 size;
>     u32 offset;
> } __aligned(8);
>
> The upper 8 bits of *size* is reserved (it contains extra metadata about
> read-only status and dynptr type); consequently, a dynptr only supports
> memory less than 16 MB.
>
> The 2 new APIs for malloc-type dynptrs are:
>
> long bpf_dynptr_alloc(u32 size, u64 flags, struct bpf_dynptr *ptr);
> void bpf_dynptr_put(struct bpf_dynptr *ptr);
>
> Please note that there *must* be a corresponding bpf_dynptr_put for
> every bpf_dynptr_alloc (even if the alloc fails). This is enforced
> by the verifier.
>
> In the verifier, dynptr state information will be tracked in stack
> slots. When the program passes in an uninitialized dynptr
> (ARG_PTR_TO_DYNPTR | MEM_UNINIT), the stack slots corresponding
> to the frame pointer where the dynptr resides at are marked STACK_DYNPTR.
>
> For helper functions that take in initialized dynptrs (eg
> bpf_dynptr_read + bpf_dynptr_write which are added later in this
> patchset), the verifier enforces that the dynptr has been initialized
> properly by checking that their corresponding stack slots have been marked
> as STACK_DYNPTR. Dynptr release functions (eg bpf_dynptr_put) will clear the
> stack slots. The verifier enforces at program exit that there are no
> referenced dynptrs that haven't been released.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>  include/linux/bpf.h            |  60 ++++++++-
>  include/linux/bpf_verifier.h   |  21 +++
>  include/uapi/linux/bpf.h       |  30 +++++
>  kernel/bpf/helpers.c           |  75 +++++++++++
>  kernel/bpf/verifier.c          | 225 ++++++++++++++++++++++++++++++++-
>  scripts/bpf_doc.py             |   2 +
>  tools/include/uapi/linux/bpf.h |  30 +++++
>  7 files changed, 440 insertions(+), 3 deletions(-)
>

[...]

> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 4565684839f1..16b7ea54a7e0 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -187,6 +187,11 @@ struct bpf_verifier_stack_elem {
>                                           POISON_POINTER_DELTA))
>  #define BPF_MAP_PTR(X)         ((struct bpf_map *)((X) & ~BPF_MAP_PTR_UNPRIV))
>
> +/* forward declarations */

it's kind of obvious from C syntax, this comment doesn't really add value

> +static bool arg_type_is_mem_size(enum bpf_arg_type type);
> +static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx);
> +static int release_reference(struct bpf_verifier_env *env, int ref_obj_id);
> +

[...]

>  static struct bpf_func_state *func(struct bpf_verifier_env *env,
>                                    const struct bpf_reg_state *reg)
>  {
> @@ -646,6 +672,134 @@ static void mark_verifier_state_scratched(struct bpf_verifier_env *env)
>         env->scratched_stack_slots = ~0ULL;
>  }
>
> +#define DYNPTR_TYPE_FLAG_MASK          DYNPTR_TYPE_MALLOC

can this be put near where DYNPTR_TYPE_MALLOC is defined? It's quite
easy to forget to update this if it's somewhere far away

> +
> +static int arg_to_dynptr_type(enum bpf_arg_type arg_type)
> +{
> +       switch (arg_type & DYNPTR_TYPE_FLAG_MASK) {
> +       case DYNPTR_TYPE_MALLOC:
> +               return BPF_DYNPTR_TYPE_MALLOC;
> +       default:
> +               return BPF_DYNPTR_TYPE_INVALID;
> +       }
> +}
> +

[...]

> +static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> +{
> +       struct bpf_func_state *state = func(env, reg);
> +       int spi, i;
> +
> +       spi = get_spi(reg->off);
> +
> +       if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS))
> +               return -EINVAL;
> +
> +       for (i = 0; i < BPF_REG_SIZE; i++) {
> +               state->stack[spi].slot_type[i] = STACK_INVALID;
> +               state->stack[spi - 1].slot_type[i] = STACK_INVALID;
> +       }
> +
> +       /* Invalidate any slices associated with this dynptr */
> +       if (dynptr_type_refcounted(state->stack[spi].spilled_ptr.dynptr.type)) {
> +               release_reference(env, state->stack[spi].spilled_ptr.id);
> +               state->stack[spi].spilled_ptr.id = 0;
> +               state->stack[spi - 1].spilled_ptr.id = 0;
> +       }
> +
> +       state->stack[spi].spilled_ptr.dynptr.type = 0;
> +       state->stack[spi - 1].spilled_ptr.dynptr.type = 0;
> +
> +       state->stack[spi].spilled_ptr.dynptr.first_slot = false;

nit: given you marked slots as STACK_INVALID, we shouldn't even look
into spilled_ptr.dynptr, so this zeroing out isn't necessary, right?

> +
> +       return 0;
> +}
> +
> +static bool is_dynptr_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> +{
> +       struct bpf_func_state *state = func(env, reg);
> +       int spi = get_spi(reg->off);
> +       int i;
> +
> +       if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS))
> +               return true;

hm... if spi bounds are invalid, shouldn't that be an error?...

> +
> +       for (i = 0; i < BPF_REG_SIZE; i++) {
> +               if (state->stack[spi].slot_type[i] == STACK_DYNPTR ||
> +                   state->stack[spi - 1].slot_type[i] == STACK_DYNPTR)
> +                       return false;
> +       }
> +
> +       return true;
> +}
> +
> +static bool is_dynptr_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> +                                    enum bpf_arg_type arg_type)
> +{
> +       struct bpf_func_state *state = func(env, reg);
> +       int spi = get_spi(reg->off);
> +       int i;
> +
> +       if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS) ||
> +           !state->stack[spi].spilled_ptr.dynptr.first_slot)
> +               return false;
> +
> +       for (i = 0; i < BPF_REG_SIZE; i++) {
> +           if (state->stack[spi].slot_type[i] != STACK_DYNPTR ||
> +               state->stack[spi - 1].slot_type[i] != STACK_DYNPTR)
> +                   return false;
> +       }

minor, but seems like it's pretty common operation to set or check all
those "microslots" to STACK_DYNPTR, would two small helpers be useful
for this (e.g., is_stack_dynptr() and set_stack_dynptr() or something
along those lines)?

> +
> +       /* ARG_PTR_TO_DYNPTR takes any type of dynptr */
> +       if (arg_type == ARG_PTR_TO_DYNPTR)
> +               return true;
> +
> +       return state->stack[spi].spilled_ptr.dynptr.type == arg_to_dynptr_type(arg_type);
> +}
> +

[...]

> @@ -5837,6 +6011,35 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
>                 bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
>
>                 err = check_mem_size_reg(env, reg, regno, zero_size_allowed, meta);
> +       } else if (arg_type_is_dynptr(arg_type)) {
> +               /* Can't pass in a dynptr at a weird offset */
> +               if (reg->off % BPF_REG_SIZE) {
> +                       verbose(env, "cannot pass in non-zero dynptr offset\n");
> +                       return -EINVAL;
> +               }
> +
> +               if (arg_type & MEM_UNINIT)  {
> +                       if (!is_dynptr_reg_valid_uninit(env, reg)) {
> +                               verbose(env, "Arg #%d dynptr has to be an uninitialized dynptr\n",
> +                                       arg + BPF_REG_1);
> +                               return -EINVAL;
> +                       }
> +
> +                       meta->uninit_dynptr_regno = arg + BPF_REG_1;

do we need a check that meta->uninit_dynptr_regno isn't already set?
I.e., prevent two uninit dynptr in a helper?

> +               } else if (!is_dynptr_reg_valid_init(env, reg, arg_type)) {
> +                       const char *err_extra = "";
> +
> +                       switch (arg_type & DYNPTR_TYPE_FLAG_MASK) {
> +                       case DYNPTR_TYPE_MALLOC:
> +                               err_extra = "malloc ";
> +                               break;
> +                       default:
> +                               break;
> +                       }
> +                       verbose(env, "Expected an initialized %sdynptr as arg #%d\n",
> +                               err_extra, arg + BPF_REG_1);
> +                       return -EINVAL;
> +               }
>         } else if (arg_type_is_alloc_size(arg_type)) {
>                 if (!tnum_is_const(reg->var_off)) {
>                         verbose(env, "R%d is not a known constant'\n",

[...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 3/6] bpf: Dynptr support for ring buffers
  2022-04-28 21:10 ` [PATCH bpf-next v3 3/6] bpf: Dynptr support for ring buffers Joanne Koong
@ 2022-05-06 23:41   ` Andrii Nakryiko
  2022-05-09 19:44     ` Joanne Koong
  0 siblings, 1 reply; 29+ messages in thread
From: Andrii Nakryiko @ 2022-05-06 23:41 UTC (permalink / raw)
  To: Joanne Koong
  Cc: bpf, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Thu, Apr 28, 2022 at 2:12 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> Currently, our only way of writing dynamically-sized data into a ring
> buffer is through bpf_ringbuf_output but this incurs an extra memcpy
> cost. bpf_ringbuf_reserve + bpf_ringbuf_commit avoids this extra
> memcpy, but it can only safely support reservation sizes that are
> statically known since the verifier cannot guarantee that the bpf
> program won’t access memory outside the reserved space.
>
> The bpf_dynptr abstraction allows for dynamically-sized ring buffer
> reservations without the extra memcpy.
>
> There are 3 new APIs:
>
> long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr);
> void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags);
> void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags);
>
> These closely follow the functionalities of the original ringbuf APIs.
> For example, all ringbuffer dynptrs that have been reserved must be
> either submitted or discarded before the program exits.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---

Looks great! Modulo those four underscores, they are super confusing...

>  include/linux/bpf.h            | 10 ++++-
>  include/uapi/linux/bpf.h       | 35 +++++++++++++++++
>  kernel/bpf/helpers.c           |  6 +++
>  kernel/bpf/ringbuf.c           | 71 ++++++++++++++++++++++++++++++++++
>  kernel/bpf/verifier.c          | 18 +++++++--
>  tools/include/uapi/linux/bpf.h | 35 +++++++++++++++++
>  6 files changed, 171 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 757440406962..10efbec99e93 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -394,7 +394,10 @@ enum bpf_type_flag {
>         /* DYNPTR points to dynamically allocated memory. */
>         DYNPTR_TYPE_MALLOC      = BIT(8 + BPF_BASE_TYPE_BITS),
>
> -       __BPF_TYPE_LAST_FLAG    = DYNPTR_TYPE_MALLOC,
> +       /* DYNPTR points to a ringbuf record. */
> +       DYNPTR_TYPE_RINGBUF     = BIT(9 + BPF_BASE_TYPE_BITS),
> +
> +       __BPF_TYPE_LAST_FLAG    = DYNPTR_TYPE_RINGBUF,

it's getting a bit old to have to update __BPF_TYPE_LAST_FLAG all the
time, maybe let's do this:

__BPF_TYPE_FLAG_MAX,
__BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1,

and never touch it again?

>  };
>

[...]

> + *
> + * void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags)
> + *     Description
> + *             Discard reserved ring buffer sample through the dynptr
> + *             interface. This is a no-op if the dynptr is invalid/null.
> + *
> + *             For more information on *flags*, please see
> + *             'bpf_ringbuf_discard'.
> + *     Return
> + *             Nothing. Always succeeds.
>   */

let's also add bpf_dynptr_is_null() (or bpf_dynptr_is_valid(), not
sure which one is more appropriate, probably just null one), so we can
check in code whether some reservation was successful without knowing
bpf_ringbuf_reserve_dynptr()'s return value


>  #define __BPF_FUNC_MAPPER(FN)          \
>         FN(unspec),                     \

[...]

> +BPF_CALL_4(bpf_ringbuf_reserve_dynptr, struct bpf_map *, map, u32, size, u64, flags,
> +          struct bpf_dynptr_kern *, ptr)
> +{
> +       void *sample;
> +       int err;
> +
> +       err = bpf_dynptr_check_size(size);
> +       if (err) {
> +               bpf_dynptr_set_null(ptr);
> +               return err;
> +       }
> +
> +       sample = (void __force *)____bpf_ringbuf_reserve(map, size, flags);

I was so confused by these four underscored for a bit... Is this
what's defined inside BPF_CALL_4 (and thus makes it ungreppable). Can
you instead just open-code container_of and __bpf_ringbuf_reserve()
directly to make it a bit easier to follow? And flags check as well.
It will so much easier to understand what's going on.

> +
> +       if (!sample) {
> +               bpf_dynptr_set_null(ptr);
> +               return -EINVAL;
> +       }
> +
> +       bpf_dynptr_init(ptr, sample, BPF_DYNPTR_TYPE_RINGBUF, 0, size);
> +
> +       return 0;
> +}
> +
> +const struct bpf_func_proto bpf_ringbuf_reserve_dynptr_proto = {
> +       .func           = bpf_ringbuf_reserve_dynptr,
> +       .ret_type       = RET_INTEGER,
> +       .arg1_type      = ARG_CONST_MAP_PTR,
> +       .arg2_type      = ARG_ANYTHING,
> +       .arg3_type      = ARG_ANYTHING,
> +       .arg4_type      = ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_RINGBUF | MEM_UNINIT,
> +};
> +
> +BPF_CALL_2(bpf_ringbuf_submit_dynptr, struct bpf_dynptr_kern *, ptr, u64, flags)
> +{
> +       if (!ptr->data)
> +               return 0;
> +
> +       ____bpf_ringbuf_submit(ptr->data, flags);

this just calls bpf_ringbuf_commit(), let's do it here explicitly as well

> +
> +       bpf_dynptr_set_null(ptr);
> +
> +       return 0;
> +}
> +
> +const struct bpf_func_proto bpf_ringbuf_submit_dynptr_proto = {
> +       .func           = bpf_ringbuf_submit_dynptr,
> +       .ret_type       = RET_VOID,
> +       .arg1_type      = ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_RINGBUF | OBJ_RELEASE,
> +       .arg2_type      = ARG_ANYTHING,
> +};
> +
> +BPF_CALL_2(bpf_ringbuf_discard_dynptr, struct bpf_dynptr_kern *, ptr, u64, flags)
> +{
> +       if (!ptr->data)
> +               return 0;
> +
> +       ____bpf_ringbuf_discard(ptr->data, flags);
> +

ditto


> +       bpf_dynptr_set_null(ptr);
> +
> +       return 0;
> +}
> +

[...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 4/6] bpf: Add bpf_dynptr_read and bpf_dynptr_write
  2022-04-28 21:10 ` [PATCH bpf-next v3 4/6] bpf: Add bpf_dynptr_read and bpf_dynptr_write Joanne Koong
@ 2022-05-06 23:48   ` Andrii Nakryiko
  2022-05-09 17:15     ` Joanne Koong
  0 siblings, 1 reply; 29+ messages in thread
From: Andrii Nakryiko @ 2022-05-06 23:48 UTC (permalink / raw)
  To: Joanne Koong
  Cc: bpf, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Thu, Apr 28, 2022 at 2:12 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> This patch adds two helper functions, bpf_dynptr_read and
> bpf_dynptr_write:
>
> long bpf_dynptr_read(void *dst, u32 len, struct bpf_dynptr *src, u32 offset);
>
> long bpf_dynptr_write(struct bpf_dynptr *dst, u32 offset, void *src, u32 len);
>
> The dynptr passed into these functions must be valid dynptrs that have
> been initialized.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>  include/linux/bpf.h            | 16 ++++++++++
>  include/uapi/linux/bpf.h       | 19 ++++++++++++
>  kernel/bpf/helpers.c           | 56 ++++++++++++++++++++++++++++++++++
>  tools/include/uapi/linux/bpf.h | 19 ++++++++++++
>  4 files changed, 110 insertions(+)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 10efbec99e93..b276dbf942dd 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -2387,6 +2387,12 @@ enum bpf_dynptr_type {
>  #define DYNPTR_SIZE_MASK       0xFFFFFF
>  #define DYNPTR_TYPE_SHIFT      28
>  #define DYNPTR_TYPE_MASK       0x7
> +#define DYNPTR_RDONLY_BIT      BIT(31)
> +
> +static inline bool bpf_dynptr_is_rdonly(struct bpf_dynptr_kern *ptr)
> +{
> +       return ptr->size & DYNPTR_RDONLY_BIT;
> +}
>
>  static inline enum bpf_dynptr_type bpf_dynptr_get_type(struct bpf_dynptr_kern *ptr)
>  {
> @@ -2408,6 +2414,16 @@ static inline int bpf_dynptr_check_size(u32 size)
>         return size > DYNPTR_MAX_SIZE ? -E2BIG : 0;
>  }
>
> +static inline int bpf_dynptr_check_off_len(struct bpf_dynptr_kern *ptr, u32 offset, u32 len)
> +{
> +       u32 capacity = bpf_dynptr_get_size(ptr) - ptr->offset;

didn't you specify that size excludes offset, so size is a capacity?

  +       /* Size represents the number of usable bytes in the dynptr.
  +        * If for example the offset is at 200 for a malloc dynptr with
  +        * allocation size 256, the number of usable bytes is 56.

> +
> +       if (len > capacity || offset > capacity - len)
> +               return -EINVAL;
> +
> +       return 0;
> +}
> +
>  void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data, enum bpf_dynptr_type type,
>                      u32 offset, u32 size);
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 679f960d2514..2d539930b7b2 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5209,6 +5209,23 @@ union bpf_attr {
>   *             'bpf_ringbuf_discard'.
>   *     Return
>   *             Nothing. Always succeeds.
> + *
> + * long bpf_dynptr_read(void *dst, u32 len, struct bpf_dynptr *src, u32 offset)
> + *     Description
> + *             Read *len* bytes from *src* into *dst*, starting from *offset*
> + *             into *src*.
> + *     Return
> + *             0 on success, -EINVAL if *offset* + *len* exceeds the length

this sounds more like E2BIG ?

> + *             of *src*'s data or if *src* is an invalid dynptr.
> + *

[...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 5/6] bpf: Add dynptr data slices
  2022-04-28 21:10 ` [PATCH bpf-next v3 5/6] bpf: Add dynptr data slices Joanne Koong
@ 2022-05-06 23:57   ` Andrii Nakryiko
  2022-05-09 17:21     ` Joanne Koong
  0 siblings, 1 reply; 29+ messages in thread
From: Andrii Nakryiko @ 2022-05-06 23:57 UTC (permalink / raw)
  To: Joanne Koong
  Cc: bpf, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Thu, Apr 28, 2022 at 2:12 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> This patch adds a new helper function
>
> void *bpf_dynptr_data(struct bpf_dynptr *ptr, u32 offset, u32 len);
>
> which returns a pointer to the underlying data of a dynptr. *len*
> must be a statically known value. The bpf program may access the returned
> data slice as a normal buffer (eg can do direct reads and writes), since
> the verifier associates the length with the returned pointer, and
> enforces that no out of bounds accesses occur.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>  include/linux/bpf.h            |  4 +++
>  include/uapi/linux/bpf.h       | 12 +++++++
>  kernel/bpf/helpers.c           | 28 +++++++++++++++
>  kernel/bpf/verifier.c          | 64 ++++++++++++++++++++++++++++++----
>  tools/include/uapi/linux/bpf.h | 12 +++++++
>  5 files changed, 114 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index b276dbf942dd..4d2de868bdbc 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -397,6 +397,9 @@ enum bpf_type_flag {
>         /* DYNPTR points to a ringbuf record. */
>         DYNPTR_TYPE_RINGBUF     = BIT(9 + BPF_BASE_TYPE_BITS),
>
> +       /* MEM is memory owned by a dynptr */
> +       MEM_DYNPTR              = BIT(10 + BPF_BASE_TYPE_BITS),

do we need this yet another bit? It seems like it only matters for
verifier log dynptr_ output? Can we just reuse MEM_ALLOC? Or there is
some ringbuf-specific logic that we'll interfere with? If feels a bit
unnecessary, let's think if we can avoid adding bits just for this.

> +
>         __BPF_TYPE_LAST_FLAG    = DYNPTR_TYPE_RINGBUF,
>  };
>

[...]

> +               if (is_dynptr_ref_function(func_id)) {
> +                       int i;
> +
> +                       /* Find the id of the dynptr we're acquiring a reference to */
> +                       for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
> +                               if (arg_type_is_dynptr(fn->arg_type[i])) {
> +                                       id = stack_slot_get_id(env, &regs[BPF_REG_1 + i]);

let's make sure that we have only one such argument?

> +                                       break;
> +                               }
> +                       }
> +                       if (unlikely(i == MAX_BPF_FUNC_REG_ARGS)) {

please don't use unlikely(), especially for non-performance critical code path

> +                               verbose(env, "verifier internal error: no dynptr args to a dynptr ref function");
> +                               return -EFAULT;
> +                       }
> +               } else {
> +                       id = acquire_reference_state(env, insn_idx);
> +                       if (id < 0)
> +                               return id;
> +               }

[...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 6/6] bpf: Dynptr tests
  2022-04-28 21:10 ` [PATCH bpf-next v3 6/6] bpf: Dynptr tests Joanne Koong
@ 2022-05-07  0:09   ` Andrii Nakryiko
  0 siblings, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2022-05-07  0:09 UTC (permalink / raw)
  To: Joanne Koong
  Cc: bpf, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Thu, Apr 28, 2022 at 2:12 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> This patch adds tests for dynptrs, which include cases that the
> verifier needs to reject (for example, invalid bpf_dynptr_put usages
> and  invalid writes/reads), as well as cases that should successfully
> pass.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---

Apart from a few nits, this looks awesome. Great coverage!

Acked-by: Andrii Nakryiko <andrii@kernel.org>

>  .../testing/selftests/bpf/prog_tests/dynptr.c | 132 ++++
>  .../testing/selftests/bpf/progs/dynptr_fail.c | 574 ++++++++++++++++++
>  .../selftests/bpf/progs/dynptr_success.c      | 218 +++++++
>  3 files changed, 924 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/dynptr.c
>  create mode 100644 tools/testing/selftests/bpf/progs/dynptr_fail.c
>  create mode 100644 tools/testing/selftests/bpf/progs/dynptr_success.c
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/dynptr.c b/tools/testing/selftests/bpf/prog_tests/dynptr.c
> new file mode 100644
> index 000000000000..0bed39fd8dac
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/dynptr.c
> @@ -0,0 +1,132 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2022 Facebook */
> +
> +#include <test_progs.h>
> +#include "dynptr_fail.skel.h"
> +#include "dynptr_success.skel.h"
> +
> +size_t log_buf_sz = 1048576; /* 1 MB */

static

> +static char obj_log_buf[1048576];
> +
> +struct {

static

> +       const char *prog_name;
> +       const char *expected_err_msg;
> +} dynptr_tests[] = {
> +       /* failure cases */
> +       {"missing_put", "Unreleased reference id=1"},
> +       {"missing_put_callback", "Unreleased reference id=1"},
> +       {"put_nonalloc", "Expected an initialized malloc dynptr as arg #1"},
> +       {"put_data_slice", "type=dynptr_mem expected=fp"},
> +       {"put_uninit_dynptr", "arg 1 is an unacquired reference"},
> +       {"use_after_put", "Expected an initialized dynptr as arg #3"},
> +       {"alloc_twice", "Arg #3 dynptr has to be an uninitialized dynptr"},
> +       {"add_dynptr_to_map1", "invalid indirect read from stack"},
> +       {"add_dynptr_to_map2", "invalid indirect read from stack"},
> +       {"ringbuf_invalid_access", "invalid mem access 'scalar'"},
> +       {"ringbuf_invalid_api", "type=dynptr_mem expected=alloc_mem"},
> +       {"ringbuf_out_of_bounds", "value is outside of the allowed memory range"},
> +       {"data_slice_out_of_bounds", "value is outside of the allowed memory range"},
> +       {"data_slice_use_after_put", "invalid mem access 'scalar'"},
> +       {"invalid_helper1", "invalid indirect read from stack"},
> +       {"invalid_helper2", "Expected an initialized dynptr as arg #3"},
> +       {"invalid_write1", "Expected an initialized malloc dynptr as arg #1"},
> +       {"invalid_write2", "Expected an initialized dynptr as arg #3"},
> +       {"invalid_write3", "Expected an initialized malloc dynptr as arg #1"},
> +       {"invalid_write4", "arg 1 is an unacquired reference"},
> +       {"invalid_read1", "invalid read from stack"},
> +       {"invalid_read2", "cannot pass in non-zero dynptr offset"},
> +       {"invalid_read3", "invalid read from stack"},
> +       {"invalid_offset", "invalid write to stack"},
> +       {"global", "R3 type=map_value expected=fp"},
> +       {"put_twice", "arg 1 is an unacquired reference"},
> +       {"put_twice_callback", "arg 1 is an unacquired reference"},
> +       {"zero_slice_access", "invalid access to memory, mem_size=0 off=0 size=1"},
> +       /* success cases */
> +       {"test_basic", NULL},
> +       {"test_data_slice", NULL},
> +       {"test_ringbuf", NULL},
> +       {"test_alloc_zero_bytes", NULL},
> +};
> +
> +static void verify_fail(const char *prog_name, const char *expected_err_msg)
> +{
> +       LIBBPF_OPTS(bpf_object_open_opts, opts);
> +       struct bpf_program *prog;
> +       struct dynptr_fail *skel;
> +       int err;
> +
> +       opts.kernel_log_buf = obj_log_buf;
> +       opts.kernel_log_size = log_buf_sz;
> +       opts.kernel_log_level = 1;
> +
> +       skel = dynptr_fail__open_opts(&opts);
> +       if (!ASSERT_OK_PTR(skel, "dynptr_fail__open_opts"))
> +               goto cleanup;
> +
> +       prog = bpf_object__find_program_by_name(skel->obj, prog_name);
> +       if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
> +               goto cleanup;
> +
> +       bpf_program__set_autoload(prog, true);
> +
> +       err = dynptr_fail__load(skel);
> +       if (!ASSERT_ERR(err, "unexpected load success"))
> +               goto cleanup;
> +
> +       if (!ASSERT_OK_PTR(strstr(obj_log_buf, expected_err_msg), "expected_err_msg")) {
> +               fprintf(stderr, "Expected err_msg: %s\n", expected_err_msg);
> +               fprintf(stderr, "Verifier output: %s\n", obj_log_buf);
> +       }
> +
> +cleanup:
> +       dynptr_fail__destroy(skel);
> +}
> +
> +static void verify_success(const char *prog_name)
> +{
> +       struct dynptr_success *skel;
> +       struct bpf_program *prog;
> +       struct bpf_link *link;
> +
> +       skel = dynptr_success__open();
> +       if (!ASSERT_OK_PTR(skel, "dynptr_success__open"))
> +               return;
> +
> +       skel->bss->pid = getpid();
> +
> +       dynptr_success__load(skel);
> +       if (!ASSERT_OK_PTR(skel, "dynptr_success__load"))
> +               goto cleanup;
> +
> +       prog = bpf_object__find_program_by_name(skel->obj, prog_name);
> +       if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
> +               goto cleanup;
> +
> +       link = bpf_program__attach(prog);
> +       if (!ASSERT_OK_PTR(link, "bpf_program__attach"))
> +               goto cleanup;
> +
> +       usleep(1);
> +
> +       ASSERT_EQ(skel->bss->err, 0, "err");
> +
> +       bpf_link__destroy(link);
> +
> +cleanup:
> +       dynptr_success__destroy(skel);
> +}
> +
> +void test_dynptr(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < ARRAY_SIZE(dynptr_tests); i++) {
> +               if (!test__start_subtest(dynptr_tests[i].prog_name))
> +                       continue;
> +
> +               if (dynptr_tests[i].expected_err_msg)
> +                       verify_fail(dynptr_tests[i].prog_name, dynptr_tests[i].expected_err_msg);
> +               else
> +                       verify_success(dynptr_tests[i].prog_name);
> +       }
> +}

overall flow and structure looks great and clean, great job!

> diff --git a/tools/testing/selftests/bpf/progs/dynptr_fail.c b/tools/testing/selftests/bpf/progs/dynptr_fail.c
> new file mode 100644
> index 000000000000..e4d5464e1865
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/dynptr_fail.c
> @@ -0,0 +1,574 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2022 Facebook */
> +
> +#include <string.h>
> +#include <linux/bpf.h>
> +#include <bpf/bpf_helpers.h>
> +#include "bpf_misc.h"
> +
> +char _license[] SEC("license") = "GPL";
> +
> +struct {
> +       __uint(type, BPF_MAP_TYPE_ARRAY);
> +       __uint(max_entries, 1);
> +       __type(key, __u32);
> +       __type(value, struct bpf_dynptr);
> +} array_map SEC(".maps");
> +
> +struct sample {
> +       int pid;
> +       long value;
> +       char comm[16];
> +};
> +
> +struct {
> +       __uint(type, BPF_MAP_TYPE_RINGBUF);
> +       __uint(max_entries, 1 << 12);

this is problematic for other architectures, let's drop max_entries
from here and do bpf_program__set_max_entries(skel->maps.ringbuf,
pagesize()) from user-space. We went though this process before, let's
not regress

> +} ringbuf SEC(".maps");
> +
> +int err = 0;
> +int val;
> +
> +/* Every bpf_dynptr_alloc call must have a corresponding bpf_dynptr_put call */
> +SEC("?raw_tp/sys_nanosleep")
> +int missing_put(void *ctx)
> +{
> +       struct bpf_dynptr mem;
> +
> +       bpf_dynptr_alloc(8, 0, &mem);
> +
> +       /* missing a call to bpf_dynptr_put(&mem) */
> +
> +       return 0;
> +}
> +
> +/* A non-alloc-ed dynptr can't be used by bpf_dynptr_put */
> +SEC("?raw_tp/sys_nanosleep")
> +int put_nonalloc(void *ctx)
> +{
> +       struct bpf_dynptr ptr;

nit: empty line after variable declaration

> +       bpf_ringbuf_reserve_dynptr(&ringbuf, val, 0, &ptr);
> +
> +       /* this should fail */
> +       bpf_dynptr_put(&ptr);
> +
> +       return 0;
> +}
> +

[...]

> +int pid = 0;
> +int err = 0;
> +int val;
> +
> +struct sample {
> +       int pid;
> +       int seq;
> +       long value;
> +       char comm[16];
> +};
> +
> +struct {
> +       __uint(type, BPF_MAP_TYPE_RINGBUF);
> +       __uint(max_entries, 1 << 12);

same about ringbuf sizing

> +} ringbuf SEC(".maps");
> +
> +SEC("tp/syscalls/sys_enter_nanosleep")
> +int test_basic(void *ctx)
> +{
> +       char write_data[64] = "hello there, world!!";
> +       char read_data[64] = {}, buf[64] = {};
> +       struct bpf_dynptr ptr = {};
> +       int i;
> +
> +       if (bpf_get_current_pid_tgid() >> 32 != pid)
> +               return 0;
> +
> +       err = bpf_dynptr_alloc(sizeof(write_data), 0, &ptr);
> +       if (err)
> +               goto done;

you don't really need `if (err) goto done` with dynptr APIs, you can have

err = bpf_dynptr_alloc(...)
err = err ?: bpf_dynptr_write();
err = err ?: bpf_dynptr_read();

That's how I'd do it in real application as well to eliminate all this
if/goto checks

> +
> +       /* Write data into the dynptr */
> +       err = bpf_dynptr_write(&ptr, 0, write_data, sizeof(write_data));
> +       if (err)
> +               goto done;
> +

[...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 1/6] bpf: Add MEM_UNINIT as a bpf_type_flag
  2022-05-06 22:46         ` Andrii Nakryiko
@ 2022-05-07  1:48           ` David Vernet
  2022-05-09 17:10           ` Joanne Koong
  1 sibling, 0 replies; 29+ messages in thread
From: David Vernet @ 2022-05-07  1:48 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Joanne Koong, bpf, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Fri, May 06, 2022 at 03:46:19PM -0700, Andrii Nakryiko wrote:
> You meant
> 
> - [ARG_PTR_TO_UNINIT_MEM]         = &mem_types,
> 
> parts as stand-alone patch? That would be invalid on its own without
> adding MEM_UNINT, so would potentially break bisection. So no, it
> shouldn't be a stand-alone patch. Each patch has to be logically
> separate but not causing any regressions in behavior, compilation,
> selftest, etc. So, for example, while we normally put selftests into
> separate tests, if kernel change breaks selftests, selftests have to
> be fixed in the same patch to avoid having any point where bisection
> can detect the breakage.

Thanks for clarifying, Andrii. I misunderstood the existing verifier code
while I was reviewing the diff and mistakenly thought it was safe to remove
these lines without MEM_UNINIT.  Apologies for the noise / confusion.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 1/6] bpf: Add MEM_UNINIT as a bpf_type_flag
  2022-05-06 22:46         ` Andrii Nakryiko
  2022-05-07  1:48           ` David Vernet
@ 2022-05-09 17:10           ` Joanne Koong
  1 sibling, 0 replies; 29+ messages in thread
From: Joanne Koong @ 2022-05-09 17:10 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: void, bpf, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Fri, May 6, 2022 at 3:46 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Fri, May 6, 2022 at 1:32 PM David Vernet <void@manifault.com> wrote:
> >
> > On Fri, May 06, 2022 at 12:09:37PM -0700, Joanne Koong wrote:
> > > I think the bpf philosophy leans more towards conflating related-ish
> > > patches into the same patchset. I think this patch could be its own
> > > stand-alone patchset, but it's also related to the dynptr patchset in that
> > > dynptrs need it to properly describe its initialization helper functions.
> > > I'm happy to submit this as its own patchset though if that is preferred :)
> >
> > Totally up to you, if that's the BPF convention then that's fine with me.
>
> You meant
>
> - [ARG_PTR_TO_UNINIT_MEM]         = &mem_types,
>
> parts as stand-alone patch? That would be invalid on its own without
> adding MEM_UNINT, so would potentially break bisection. So no, it
> shouldn't be a stand-alone patch. Each patch has to be logically
> separate but not causing any regressions in behavior, compilation,
> selftest, etc. So, for example, while we normally put selftests into
> separate tests, if kernel change breaks selftests, selftests have to
> be fixed in the same patch to avoid having any point where bisection
> can detect the breakage.
>
Ah okay, I thought we were talking about having all of the first patch
be its standalone patch. sorry for the confusion.
>
> >
> > >
> > > > -     } else if (base_type(arg_type) == ARG_PTR_TO_MAP_VALUE ||
> > > > > -                base_type(arg_type) == ARG_PTR_TO_UNINIT_MAP_VALUE) {
> > > > > +     } else if (base_type(arg_type) == ARG_PTR_TO_MAP_VALUE) {
> > > > >               if (type_may_be_null(arg_type) && register_is_null(reg))
> > > > >                       return 0;
> > > > >
> > > > > @@ -5811,7 +5801,7 @@ static int check_func_arg(struct bpf_verifier_env
> > > > *env, u32 arg,
> > > > >                       verbose(env, "invalid map_ptr to access
> > > > map->value\n");
> > > > >                       return -EACCES;
> > > > >               }
> > > > > -             meta->raw_mode = (arg_type == ARG_PTR_TO_UNINIT_MAP_VALUE);
> > > > > +             meta->raw_mode = arg_type & MEM_UNINIT;
> > > >
> > > > Given that we're stashing in a bool here, should this be:
> > > >
> > > >         meta->raw_mode = (arg_type & MEM_UNINIT) != 0;
> > > >
> > > I think just arg_type & MEM_UNINIT is okay because it implicitly converts
> > > from 1 -> true, 0 -> false. This is the convention that's used elsewhere in
> > > the linux codebase as well
> >
> > Yeah I think functionally it will work just fine as is. I saw that a few
> > other places in verifier.c use operators that explicitly make the result 0
> > or 1, e.g.:
> >
> > 14699
> > 14700         env->strict_alignment = !!(attr->prog_flags & BPF_F_STRICT_ALIGNMENT);
> >
> > But the compiler will indeed implicitly convert any nonzero value to 1 if
> > it's stored in a bool, so it's not necessary for correctness. It looks like
> > the kernel style guide also implies that using the extra operators isn't
> > necessary, so I think we can leave it as you have it now:
> > https://www.kernel.org/doc/html/latest/process/coding-style.html#using-bool
>
> Yeah, the above example is rather unusual, I'd say. We do
> !!(bool_expr) only when we want to assign that to integer (not bool)
> variable/field as 0 or 1. Otherwise it's a well-defined compiler
> conversion rule for any non-zero value to be true during bool
> conversion.
>
> >
> > > > What do you think about this as a possibly more concise way to express that
> > > > the curr and next args differ?
> > > >
> > > >         return (base_type(arg_curr) == ARG_PTR_TO_MEM) !=
> > > >                 arg_type_is_mem_size(arg_next);
> > > >
> > > I was trying to decide between this and the more verbose expression above
> > > and ultimately went with the more verbose expression because it seemed more
> > > readable to me. But I don't feel strongly :) I'm cool with either one
> >
> > I don't feel strongly either, if you think your way is more readable then
> > don't feel obligated to change it.
> >
>
> Heh, this also caught my eye. It's subjective, but inequality is
> shorter and more readable (even in terms of the logic it expresses).
> But it's fine either way with me.
Since both of you think the inequality way is more readable, I will
change it to inequality for v4 then :)
>
> > Thanks,
> > David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 4/6] bpf: Add bpf_dynptr_read and bpf_dynptr_write
  2022-05-06 23:48   ` Andrii Nakryiko
@ 2022-05-09 17:15     ` Joanne Koong
  0 siblings, 0 replies; 29+ messages in thread
From: Joanne Koong @ 2022-05-09 17:15 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Fri, May 6, 2022 at 4:48 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Apr 28, 2022 at 2:12 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > This patch adds two helper functions, bpf_dynptr_read and
> > bpf_dynptr_write:
> >
> > long bpf_dynptr_read(void *dst, u32 len, struct bpf_dynptr *src, u32 offset);
> >
> > long bpf_dynptr_write(struct bpf_dynptr *dst, u32 offset, void *src, u32 len);
> >
> > The dynptr passed into these functions must be valid dynptrs that have
> > been initialized.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
> >  include/linux/bpf.h            | 16 ++++++++++
> >  include/uapi/linux/bpf.h       | 19 ++++++++++++
> >  kernel/bpf/helpers.c           | 56 ++++++++++++++++++++++++++++++++++
> >  tools/include/uapi/linux/bpf.h | 19 ++++++++++++
> >  4 files changed, 110 insertions(+)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 10efbec99e93..b276dbf942dd 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -2387,6 +2387,12 @@ enum bpf_dynptr_type {
> >  #define DYNPTR_SIZE_MASK       0xFFFFFF
> >  #define DYNPTR_TYPE_SHIFT      28
> >  #define DYNPTR_TYPE_MASK       0x7
> > +#define DYNPTR_RDONLY_BIT      BIT(31)
> > +
> > +static inline bool bpf_dynptr_is_rdonly(struct bpf_dynptr_kern *ptr)
> > +{
> > +       return ptr->size & DYNPTR_RDONLY_BIT;
> > +}
> >
> >  static inline enum bpf_dynptr_type bpf_dynptr_get_type(struct bpf_dynptr_kern *ptr)
> >  {
> > @@ -2408,6 +2414,16 @@ static inline int bpf_dynptr_check_size(u32 size)
> >         return size > DYNPTR_MAX_SIZE ? -E2BIG : 0;
> >  }
> >
> > +static inline int bpf_dynptr_check_off_len(struct bpf_dynptr_kern *ptr, u32 offset, u32 len)
> > +{
> > +       u32 capacity = bpf_dynptr_get_size(ptr) - ptr->offset;
>
> didn't you specify that size excludes offset, so size is a capacity?
Yes, bpf_dynptr_get_size(ptr) is the capacity. I will fix this for v4
>
>   +       /* Size represents the number of usable bytes in the dynptr.
>   +        * If for example the offset is at 200 for a malloc dynptr with
>   +        * allocation size 256, the number of usable bytes is 56.
>
> > +
> > +       if (len > capacity || offset > capacity - len)
> > +               return -EINVAL;
> > +
> > +       return 0;
> > +}
> > +
> >  void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data, enum bpf_dynptr_type type,
> >                      u32 offset, u32 size);
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 679f960d2514..2d539930b7b2 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -5209,6 +5209,23 @@ union bpf_attr {
> >   *             'bpf_ringbuf_discard'.
> >   *     Return
> >   *             Nothing. Always succeeds.
> > + *
> > + * long bpf_dynptr_read(void *dst, u32 len, struct bpf_dynptr *src, u32 offset)
> > + *     Description
> > + *             Read *len* bytes from *src* into *dst*, starting from *offset*
> > + *             into *src*.
> > + *     Return
> > + *             0 on success, -EINVAL if *offset* + *len* exceeds the length
>
> this sounds more like E2BIG ?
I'll change this to -E2BIG here and in bpf_dynptr_write
>
> > + *             of *src*'s data or if *src* is an invalid dynptr.
> > + *
>
> [...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 5/6] bpf: Add dynptr data slices
  2022-05-06 23:57   ` Andrii Nakryiko
@ 2022-05-09 17:21     ` Joanne Koong
  2022-05-09 18:29       ` Andrii Nakryiko
  0 siblings, 1 reply; 29+ messages in thread
From: Joanne Koong @ 2022-05-09 17:21 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Fri, May 6, 2022 at 4:57 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Apr 28, 2022 at 2:12 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > This patch adds a new helper function
> >
> > void *bpf_dynptr_data(struct bpf_dynptr *ptr, u32 offset, u32 len);
> >
> > which returns a pointer to the underlying data of a dynptr. *len*
> > must be a statically known value. The bpf program may access the returned
> > data slice as a normal buffer (eg can do direct reads and writes), since
> > the verifier associates the length with the returned pointer, and
> > enforces that no out of bounds accesses occur.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
> >  include/linux/bpf.h            |  4 +++
> >  include/uapi/linux/bpf.h       | 12 +++++++
> >  kernel/bpf/helpers.c           | 28 +++++++++++++++
> >  kernel/bpf/verifier.c          | 64 ++++++++++++++++++++++++++++++----
> >  tools/include/uapi/linux/bpf.h | 12 +++++++
> >  5 files changed, 114 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index b276dbf942dd..4d2de868bdbc 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -397,6 +397,9 @@ enum bpf_type_flag {
> >         /* DYNPTR points to a ringbuf record. */
> >         DYNPTR_TYPE_RINGBUF     = BIT(9 + BPF_BASE_TYPE_BITS),
> >
> > +       /* MEM is memory owned by a dynptr */
> > +       MEM_DYNPTR              = BIT(10 + BPF_BASE_TYPE_BITS),
>
> do we need this yet another bit? It seems like it only matters for
> verifier log dynptr_ output? Can we just reuse MEM_ALLOC? Or there is
> some ringbuf-specific logic that we'll interfere with? If feels a bit
> unnecessary, let's think if we can avoid adding bits just for this.
I think we do need this bit to differentiate between MEM_ALLOC and
MEM_DYNPTR because otherwise, you would be able to pass in a dynptr
data slice to the ringbuf bpf_ringbuf_submit/discard helpers.
>
> > +
> >         __BPF_TYPE_LAST_FLAG    = DYNPTR_TYPE_RINGBUF,
> >  };
> >
>
> [...]
>
> > +               if (is_dynptr_ref_function(func_id)) {
> > +                       int i;
> > +
> > +                       /* Find the id of the dynptr we're acquiring a reference to */
> > +                       for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
> > +                               if (arg_type_is_dynptr(fn->arg_type[i])) {
> > +                                       id = stack_slot_get_id(env, &regs[BPF_REG_1 + i]);
>
> let's make sure that we have only one such argument?
Are we able to assume it's guaranteed given that
is_dynptr_ref_function() refers to BPF_FUNC_dynptr_data and the
definition of bpf_dynptr_data will always only have one dynptr arg?
>
> > +                                       break;
> > +                               }
> > +                       }
> > +                       if (unlikely(i == MAX_BPF_FUNC_REG_ARGS)) {
>
> please don't use unlikely(), especially for non-performance critical code path
>
> > +                               verbose(env, "verifier internal error: no dynptr args to a dynptr ref function");
> > +                               return -EFAULT;
> > +                       }
> > +               } else {
> > +                       id = acquire_reference_state(env, insn_idx);
> > +                       if (id < 0)
> > +                               return id;
> > +               }
>
> [...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 0/6] Dynamic pointers
  2022-05-06 22:35 ` [PATCH bpf-next v3 0/6] Dynamic pointers Andrii Nakryiko
@ 2022-05-09 17:26   ` Joanne Koong
  0 siblings, 0 replies; 29+ messages in thread
From: Joanne Koong @ 2022-05-09 17:26 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Fri, May 6, 2022 at 3:35 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Apr 28, 2022 at 2:11 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > This patchset implements the basics of dynamic pointers in bpf.
> >
> > A dynamic pointer (struct bpf_dynptr) is a pointer that stores extra metadata
> > alongside the address it points to. This abstraction is useful in bpf, given
> > that every memory access in a bpf program must be safe. The verifier and bpf
> > helper functions can use the metadata to enforce safety guarantees for things
> > such as dynamically sized strings and kernel heap allocations.
> >
> > From the program side, the bpf_dynptr is an opaque struct and the verifier
> > will enforce that its contents are never written to by the program.
> > It can only be written to through specific bpf helper functions.
> >
> > There are several uses cases for dynamic pointers in bpf programs. A list of
> > some are: dynamically sized ringbuf reservations without any extra memcpys,
> > dynamic string parsing and memory comparisons, dynamic memory allocations that
> > can be persisted in a map, and dynamic parsing of sk_buff and xdp_md packet
> > data.
> >
> > At a high-level, the patches are as follows:
> > 1/6 - Adds MEM_UNINIT as a bpf_type_flag
> > 2/6 - Adds verifier support for dynptrs and implements malloc dynptrs
> > 3/6 - Adds dynptr support for ring buffers
> > 4/6 - Adds bpf_dynptr_read and bpf_dynptr_write
> > 5/6 - Adds dynptr data slices (ptr to underlying dynptr memory)
> > 6/6 - Tests to check that verifier rejects certain fail cases and passes
> > certain success cases
> >
> > This is the first dynptr patchset in a larger series. The next series of
> > patches will add persisting dynamic memory allocations in maps, parsing packet
> > data through dynptrs, convenience helpers for using dynptrs as iterators, and
> > more helper functions for interacting with strings and memory dynamically.
> >
> > Changelog:
> > ----------
> > v3 -> v2:
> > v2: https://lore.kernel.org/bpf/20220416063429.3314021-1-joannelkoong@gmail.com/
> >
> > * Reorder patches (move ringbuffer patch to be right after the verifier + malloc
> > dynptr patchset)
> > * Remove local type dynptrs (Andrii + Alexei)
> > * Mark stack slot as STACK_MISC after any writes into a dynptr instead of
> > * explicitly prohibiting writes (Alexei)
> > * Pass number of slots, not memory size to is_spi_bounds_valid (Kumar)
> > * Check reference leaks by adding dynptr id to state->refs instead of checking
> > stack slots (Alexei)
> >
>
> There seems to be test_progs-no_alu32 failure in CI ([0]), please take a look
>
>   [0] https://github.com/kernel-patches/bpf/runs/6223168213?check_suite_focus=true#step:6:6430
>
I'll look into this and fix it for v4. thanks for taking the time to
review all these dynptr changes in all the versions, Andrii!
> > v1 -> v2:
> > v1: https://lore.kernel.org/bpf/20220402015826.3941317-1-joannekoong@fb.com/
> >
> > 1/7 -
> >     * Remove ARG_PTR_TO_MAP_VALUE_UNINIT alias and use
> >       ARG_PTR_TO_MAP_VALUE | MEM_UNINIT directly (Andrii)
> >     * Drop arg_type_is_mem_ptr() wrapper function (Andrii)
> > 2/7 -
> >     * Change name from MEM_RELEASE to OBJ_RELEASE (Andrii)
> >     * Use meta.release_ref instead of ref_obj_id != 0 to determine whether
> >       to release reference (Kumar)
> >     * Drop type_is_release_mem() wrapper function (Andrii)
> > 3/7 -
> >     * Add checks for nested dynptrs edge-cases, which could lead to corrupt
> >     * writes of the dynptr stack variable.
> >     * Add u64 flags to bpf_dynptr_from_mem() and bpf_dynptr_alloc() (Andrii)
> >     * Rename from bpf_malloc/bpf_free to bpf_dynptr_alloc/bpf_dynptr_put
> >       (Alexei)
> >     * Support alloc flag __GFP_ZERO (Andrii)
> >     * Reserve upper 8 bits in dynptr size and offset fields instead of
> >       reserving just the upper 4 bits (Andrii)
> >     * Allow dynptr zero-slices (Andrii)
> >     * Use the highest bit for is_rdonly instead of the 28th bit (Andrii)
> >     * Rename check_* functions to is_* functions for better readability
> >       (Andrii)
> >     * Add comment for code that checks the spi bounds (Andrii)
> > 4/7 -
> >     * Fix doc description for bpf_dynpt_read (Toke)
> >     * Move bpf_dynptr_check_off_len() from function patch 1 to here (Andrii)
> > 5/7 -
> >     * When finding the id for the dynptr to associate the data slice with,
> >       look for dynptr arg instead of assuming it is BPF_REG_1.
> > 6/7 -
> >     * Add __force when casting from unsigned long to void * (kernel test robot)
> >     * Expand on docs for ringbuf dynptr APIs (Andrii)
> > 7/7 -
> >     * Use table approach for defining test programs and error messages (Andrii)
> >     * Print out full log if there’s an error (Andrii)
> >     * Use bpf_object__find_program_by_name() instead of specifying
> >       program name as a string (Andrii)
> >     * Add 6 extra cases: invalid_nested_dynptrs1, invalid_nested_dynptrs2,
> >       invalid_ref_mem1, invalid_ref_mem2, zero_slice_access,
> >       and test_alloc_zero_bytes
> >     * Add checking for edge cases (eg allocing with invalid flags)
> >
> >
> > Joanne Koong (6):
> >   bpf: Add MEM_UNINIT as a bpf_type_flag
> >   bpf: Add verifier support for dynptrs and implement malloc dynptrs
> >   bpf: Dynptr support for ring buffers
> >   bpf: Add bpf_dynptr_read and bpf_dynptr_write
> >   bpf: Add dynptr data slices
> >   bpf: Dynptr tests
> >
> >  include/linux/bpf.h                           | 103 +++-
> >  include/linux/bpf_verifier.h                  |  21 +
> >  include/uapi/linux/bpf.h                      |  96 +++
> >  kernel/bpf/helpers.c                          | 169 +++++-
> >  kernel/bpf/ringbuf.c                          |  71 +++
> >  kernel/bpf/verifier.c                         | 329 +++++++++-
> >  scripts/bpf_doc.py                            |   2 +
> >  tools/include/uapi/linux/bpf.h                |  96 +++
> >  .../testing/selftests/bpf/prog_tests/dynptr.c | 132 ++++
> >  .../testing/selftests/bpf/progs/dynptr_fail.c | 574 ++++++++++++++++++
> >  .../selftests/bpf/progs/dynptr_success.c      | 218 +++++++
> >  11 files changed, 1774 insertions(+), 37 deletions(-)
> >  create mode 100644 tools/testing/selftests/bpf/prog_tests/dynptr.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/dynptr_fail.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/dynptr_success.c
> >
> > --
> > 2.30.2
> >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 5/6] bpf: Add dynptr data slices
  2022-05-09 17:21     ` Joanne Koong
@ 2022-05-09 18:29       ` Andrii Nakryiko
  0 siblings, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2022-05-09 18:29 UTC (permalink / raw)
  To: Joanne Koong
  Cc: bpf, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Mon, May 9, 2022 at 10:22 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Fri, May 6, 2022 at 4:57 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Apr 28, 2022 at 2:12 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > >
> > > This patch adds a new helper function
> > >
> > > void *bpf_dynptr_data(struct bpf_dynptr *ptr, u32 offset, u32 len);
> > >
> > > which returns a pointer to the underlying data of a dynptr. *len*
> > > must be a statically known value. The bpf program may access the returned
> > > data slice as a normal buffer (eg can do direct reads and writes), since
> > > the verifier associates the length with the returned pointer, and
> > > enforces that no out of bounds accesses occur.
> > >
> > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > > ---
> > >  include/linux/bpf.h            |  4 +++
> > >  include/uapi/linux/bpf.h       | 12 +++++++
> > >  kernel/bpf/helpers.c           | 28 +++++++++++++++
> > >  kernel/bpf/verifier.c          | 64 ++++++++++++++++++++++++++++++----
> > >  tools/include/uapi/linux/bpf.h | 12 +++++++
> > >  5 files changed, 114 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > index b276dbf942dd..4d2de868bdbc 100644
> > > --- a/include/linux/bpf.h
> > > +++ b/include/linux/bpf.h
> > > @@ -397,6 +397,9 @@ enum bpf_type_flag {
> > >         /* DYNPTR points to a ringbuf record. */
> > >         DYNPTR_TYPE_RINGBUF     = BIT(9 + BPF_BASE_TYPE_BITS),
> > >
> > > +       /* MEM is memory owned by a dynptr */
> > > +       MEM_DYNPTR              = BIT(10 + BPF_BASE_TYPE_BITS),
> >
> > do we need this yet another bit? It seems like it only matters for
> > verifier log dynptr_ output? Can we just reuse MEM_ALLOC? Or there is
> > some ringbuf-specific logic that we'll interfere with? If feels a bit
> > unnecessary, let's think if we can avoid adding bits just for this.
> I think we do need this bit to differentiate between MEM_ALLOC and
> MEM_DYNPTR because otherwise, you would be able to pass in a dynptr
> data slice to the ringbuf bpf_ringbuf_submit/discard helpers.

Right :( I forgot and missed ARG_PTR_TO_ALLOC_MEM, which relies on
this. No big deal.

As an aside, I regret using super-generic ALLOC_MEM naming for
strictly ringbuf-specific register, RINGBUF_MEM or something would be
better. But we can improve that separately.


> >
> > > +
> > >         __BPF_TYPE_LAST_FLAG    = DYNPTR_TYPE_RINGBUF,
> > >  };
> > >
> >
> > [...]
> >
> > > +               if (is_dynptr_ref_function(func_id)) {
> > > +                       int i;
> > > +
> > > +                       /* Find the id of the dynptr we're acquiring a reference to */
> > > +                       for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
> > > +                               if (arg_type_is_dynptr(fn->arg_type[i])) {
> > > +                                       id = stack_slot_get_id(env, &regs[BPF_REG_1 + i]);
> >
> > let's make sure that we have only one such argument?
> Are we able to assume it's guaranteed given that
> is_dynptr_ref_function() refers to BPF_FUNC_dynptr_data and the
> definition of bpf_dynptr_data will always only have one dynptr arg?

That's why I think we should add this check, to guarantee it. Your
code assumes the first dynptr argument is the only one, completely
ignoring any more potential dynptr arguments if they are there. I
don't think that would be correct if we ever have such helpers with
two dynptrs, so we should be explicit about preventing that.

> >
> > > +                                       break;
> > > +                               }
> > > +                       }
> > > +                       if (unlikely(i == MAX_BPF_FUNC_REG_ARGS)) {
> >
> > please don't use unlikely(), especially for non-performance critical code path
> >
> > > +                               verbose(env, "verifier internal error: no dynptr args to a dynptr ref function");
> > > +                               return -EFAULT;
> > > +                       }
> > > +               } else {
> > > +                       id = acquire_reference_state(env, insn_idx);
> > > +                       if (id < 0)
> > > +                               return id;
> > > +               }
> >
> > [...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 2/6] bpf: Add verifier support for dynptrs and implement malloc dynptrs
  2022-05-06 23:30   ` Andrii Nakryiko
@ 2022-05-09 18:58     ` Joanne Koong
  2022-05-09 19:26       ` Andrii Nakryiko
  0 siblings, 1 reply; 29+ messages in thread
From: Joanne Koong @ 2022-05-09 18:58 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Fri, May 6, 2022 at 4:30 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Apr 28, 2022 at 2:12 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > This patch adds the bulk of the verifier work for supporting dynamic
> > pointers (dynptrs) in bpf. This patch implements malloc-type dynptrs
> > through 2 new APIs (bpf_dynptr_alloc and bpf_dynptr_put) that can be
> > called by a bpf program. Malloc-type dynptrs are dynptrs that dynamically
> > allocate memory on behalf of the program.
> >
> > A bpf_dynptr is opaque to the bpf program. It is a 16-byte structure
> > defined internally as:
> >
> > struct bpf_dynptr_kern {
> >     void *data;
> >     u32 size;
> >     u32 offset;
> > } __aligned(8);
> >
> > The upper 8 bits of *size* is reserved (it contains extra metadata about
> > read-only status and dynptr type); consequently, a dynptr only supports
> > memory less than 16 MB.
> >
> > The 2 new APIs for malloc-type dynptrs are:
> >
> > long bpf_dynptr_alloc(u32 size, u64 flags, struct bpf_dynptr *ptr);
> > void bpf_dynptr_put(struct bpf_dynptr *ptr);
> >
> > Please note that there *must* be a corresponding bpf_dynptr_put for
> > every bpf_dynptr_alloc (even if the alloc fails). This is enforced
> > by the verifier.
> >
> > In the verifier, dynptr state information will be tracked in stack
> > slots. When the program passes in an uninitialized dynptr
> > (ARG_PTR_TO_DYNPTR | MEM_UNINIT), the stack slots corresponding
> > to the frame pointer where the dynptr resides at are marked STACK_DYNPTR.
> >
> > For helper functions that take in initialized dynptrs (eg
> > bpf_dynptr_read + bpf_dynptr_write which are added later in this
> > patchset), the verifier enforces that the dynptr has been initialized
> > properly by checking that their corresponding stack slots have been marked
> > as STACK_DYNPTR. Dynptr release functions (eg bpf_dynptr_put) will clear the
> > stack slots. The verifier enforces at program exit that there are no
> > referenced dynptrs that haven't been released.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
> >  include/linux/bpf.h            |  60 ++++++++-
> >  include/linux/bpf_verifier.h   |  21 +++
> >  include/uapi/linux/bpf.h       |  30 +++++
> >  kernel/bpf/helpers.c           |  75 +++++++++++
> >  kernel/bpf/verifier.c          | 225 ++++++++++++++++++++++++++++++++-
> >  scripts/bpf_doc.py             |   2 +
> >  tools/include/uapi/linux/bpf.h |  30 +++++
> >  7 files changed, 440 insertions(+), 3 deletions(-)
> >
>
> [...]
>
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 4565684839f1..16b7ea54a7e0 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -187,6 +187,11 @@ struct bpf_verifier_stack_elem {
> >                                           POISON_POINTER_DELTA))
> >  #define BPF_MAP_PTR(X)         ((struct bpf_map *)((X) & ~BPF_MAP_PTR_UNPRIV))
> >
> > +/* forward declarations */
>
> it's kind of obvious from C syntax, this comment doesn't really add value
>
> > +static bool arg_type_is_mem_size(enum bpf_arg_type type);
> > +static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx);
> > +static int release_reference(struct bpf_verifier_env *env, int ref_obj_id);
> > +
>
> [...]
>
> >  static struct bpf_func_state *func(struct bpf_verifier_env *env,
> >                                    const struct bpf_reg_state *reg)
> >  {
> > @@ -646,6 +672,134 @@ static void mark_verifier_state_scratched(struct bpf_verifier_env *env)
> >         env->scratched_stack_slots = ~0ULL;
> >  }
> >
> > +#define DYNPTR_TYPE_FLAG_MASK          DYNPTR_TYPE_MALLOC
>
> can this be put near where DYNPTR_TYPE_MALLOC is defined? It's quite
> easy to forget to update this if it's somewhere far away
Sounds great!
>
> > +
> > +static int arg_to_dynptr_type(enum bpf_arg_type arg_type)
> > +{
> > +       switch (arg_type & DYNPTR_TYPE_FLAG_MASK) {
> > +       case DYNPTR_TYPE_MALLOC:
> > +               return BPF_DYNPTR_TYPE_MALLOC;
> > +       default:
> > +               return BPF_DYNPTR_TYPE_INVALID;
> > +       }
> > +}
> > +
>
> [...]
>
> > +static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> > +{
> > +       struct bpf_func_state *state = func(env, reg);
> > +       int spi, i;
> > +
> > +       spi = get_spi(reg->off);
> > +
> > +       if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS))
> > +               return -EINVAL;
> > +
> > +       for (i = 0; i < BPF_REG_SIZE; i++) {
> > +               state->stack[spi].slot_type[i] = STACK_INVALID;
> > +               state->stack[spi - 1].slot_type[i] = STACK_INVALID;
> > +       }
> > +
> > +       /* Invalidate any slices associated with this dynptr */
> > +       if (dynptr_type_refcounted(state->stack[spi].spilled_ptr.dynptr.type)) {
> > +               release_reference(env, state->stack[spi].spilled_ptr.id);
> > +               state->stack[spi].spilled_ptr.id = 0;
> > +               state->stack[spi - 1].spilled_ptr.id = 0;
> > +       }
> > +
> > +       state->stack[spi].spilled_ptr.dynptr.type = 0;
> > +       state->stack[spi - 1].spilled_ptr.dynptr.type = 0;
> > +
> > +       state->stack[spi].spilled_ptr.dynptr.first_slot = false;
>
> nit: given you marked slots as STACK_INVALID, we shouldn't even look
> into spilled_ptr.dynptr, so this zeroing out isn't necessary, right?
My concern was that if the stack slot gets reused and we don't zero
this out, then this would lead to some erroneous state. Do you think
this is a valid concern or an unnecessary concern?
>
> > +
> > +       return 0;
> > +}
> > +
> > +static bool is_dynptr_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> > +{
> > +       struct bpf_func_state *state = func(env, reg);
> > +       int spi = get_spi(reg->off);
> > +       int i;
> > +
> > +       if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS))
> > +               return true;
>
> hm... if spi bounds are invalid, shouldn't that be an error?...
The spi bounds can be invalid if the uninitialized dynptr variable
hasn't yet been allocated on the stack. We do the stack allocation for
it in check_helper_call() after we check all the args (we call this
is_dynptr_reg_valid_uninit function when we check the args).
>
> > +
> > +       for (i = 0; i < BPF_REG_SIZE; i++) {
> > +               if (state->stack[spi].slot_type[i] == STACK_DYNPTR ||
> > +                   state->stack[spi - 1].slot_type[i] == STACK_DYNPTR)
> > +                       return false;
> > +       }
> > +
> > +       return true;
> > +}
> > +
> > +static bool is_dynptr_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> > +                                    enum bpf_arg_type arg_type)
> > +{
> > +       struct bpf_func_state *state = func(env, reg);
> > +       int spi = get_spi(reg->off);
> > +       int i;
> > +
> > +       if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS) ||
> > +           !state->stack[spi].spilled_ptr.dynptr.first_slot)
> > +               return false;
> > +
> > +       for (i = 0; i < BPF_REG_SIZE; i++) {
> > +           if (state->stack[spi].slot_type[i] != STACK_DYNPTR ||
> > +               state->stack[spi - 1].slot_type[i] != STACK_DYNPTR)
> > +                   return false;
> > +       }
>
> minor, but seems like it's pretty common operation to set or check all
> those "microslots" to STACK_DYNPTR, would two small helpers be useful
> for this (e.g., is_stack_dynptr() and set_stack_dynptr() or something
> along those lines)?
I think having is_stack_dynptr() might be misleading because
!is_stack_dynptr() does not mean the stack slots are not dynptr.
is_stack_dynptr() is true only if both stack slots are marked as
dynptr, !is_stack_dynptr() is therefore true if one or both slots are
not marked as dynptr, but really the stack slots are truly not dynptr
only if both slots are not marked as dynptr. (for example, if spi is
STACK_DYNPTR and spi - 1 is not, !is_stack_dynptr is true, but really
it is not a stack dynptr only if both slots are not marked as dynptr).

looking through the file, I think there is currently only one place
where we set the stack dynptr (in mark_stack_slots_dynptr) and test if
it is a stack dynptr (in is_dynptr_reg_valid_init), so maybe just
leaving this as is is okay? But I'm also happy to move these to
smaller helpers if you don't think the !is_stack_dynptr() is too
misleading/confusing.
>
> > +
> > +       /* ARG_PTR_TO_DYNPTR takes any type of dynptr */
> > +       if (arg_type == ARG_PTR_TO_DYNPTR)
> > +               return true;
> > +
> > +       return state->stack[spi].spilled_ptr.dynptr.type == arg_to_dynptr_type(arg_type);
> > +}
> > +
>
> [...]
>
> > @@ -5837,6 +6011,35 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
> >                 bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
> >
> >                 err = check_mem_size_reg(env, reg, regno, zero_size_allowed, meta);
> > +       } else if (arg_type_is_dynptr(arg_type)) {
> > +               /* Can't pass in a dynptr at a weird offset */
> > +               if (reg->off % BPF_REG_SIZE) {
> > +                       verbose(env, "cannot pass in non-zero dynptr offset\n");
> > +                       return -EINVAL;
> > +               }
> > +
> > +               if (arg_type & MEM_UNINIT)  {
> > +                       if (!is_dynptr_reg_valid_uninit(env, reg)) {
> > +                               verbose(env, "Arg #%d dynptr has to be an uninitialized dynptr\n",
> > +                                       arg + BPF_REG_1);
> > +                               return -EINVAL;
> > +                       }
> > +
> > +                       meta->uninit_dynptr_regno = arg + BPF_REG_1;
>
> do we need a check that meta->uninit_dynptr_regno isn't already set?
> I.e., prevent two uninit dynptr in a helper?
I don't think we do because the helper functions only take in one
uninitialized dynptr at the moment, but adding this check would
prevent the case where in the future, there are more than 1
uninitialized dynptr args and this verifier code didn't get modified
to support that. I will add this in for v4.
>
> > +               } else if (!is_dynptr_reg_valid_init(env, reg, arg_type)) {
> > +                       const char *err_extra = "";
> > +
> > +                       switch (arg_type & DYNPTR_TYPE_FLAG_MASK) {
> > +                       case DYNPTR_TYPE_MALLOC:
> > +                               err_extra = "malloc ";
> > +                               break;
> > +                       default:
> > +                               break;
> > +                       }
> > +                       verbose(env, "Expected an initialized %sdynptr as arg #%d\n",
> > +                               err_extra, arg + BPF_REG_1);
> > +                       return -EINVAL;
> > +               }
> >         } else if (arg_type_is_alloc_size(arg_type)) {
> >                 if (!tnum_is_const(reg->var_off)) {
> >                         verbose(env, "R%d is not a known constant'\n",
>
> [...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 2/6] bpf: Add verifier support for dynptrs and implement malloc dynptrs
  2022-05-09 18:58     ` Joanne Koong
@ 2022-05-09 19:26       ` Andrii Nakryiko
  0 siblings, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2022-05-09 19:26 UTC (permalink / raw)
  To: Joanne Koong
  Cc: bpf, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Mon, May 9, 2022 at 11:58 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Fri, May 6, 2022 at 4:30 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Apr 28, 2022 at 2:12 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > >
> > > This patch adds the bulk of the verifier work for supporting dynamic
> > > pointers (dynptrs) in bpf. This patch implements malloc-type dynptrs
> > > through 2 new APIs (bpf_dynptr_alloc and bpf_dynptr_put) that can be
> > > called by a bpf program. Malloc-type dynptrs are dynptrs that dynamically
> > > allocate memory on behalf of the program.
> > >
> > > A bpf_dynptr is opaque to the bpf program. It is a 16-byte structure
> > > defined internally as:
> > >
> > > struct bpf_dynptr_kern {
> > >     void *data;
> > >     u32 size;
> > >     u32 offset;
> > > } __aligned(8);
> > >
> > > The upper 8 bits of *size* is reserved (it contains extra metadata about
> > > read-only status and dynptr type); consequently, a dynptr only supports
> > > memory less than 16 MB.
> > >
> > > The 2 new APIs for malloc-type dynptrs are:
> > >
> > > long bpf_dynptr_alloc(u32 size, u64 flags, struct bpf_dynptr *ptr);
> > > void bpf_dynptr_put(struct bpf_dynptr *ptr);
> > >
> > > Please note that there *must* be a corresponding bpf_dynptr_put for
> > > every bpf_dynptr_alloc (even if the alloc fails). This is enforced
> > > by the verifier.
> > >
> > > In the verifier, dynptr state information will be tracked in stack
> > > slots. When the program passes in an uninitialized dynptr
> > > (ARG_PTR_TO_DYNPTR | MEM_UNINIT), the stack slots corresponding
> > > to the frame pointer where the dynptr resides at are marked STACK_DYNPTR.
> > >
> > > For helper functions that take in initialized dynptrs (eg
> > > bpf_dynptr_read + bpf_dynptr_write which are added later in this
> > > patchset), the verifier enforces that the dynptr has been initialized
> > > properly by checking that their corresponding stack slots have been marked
> > > as STACK_DYNPTR. Dynptr release functions (eg bpf_dynptr_put) will clear the
> > > stack slots. The verifier enforces at program exit that there are no
> > > referenced dynptrs that haven't been released.
> > >
> > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > > ---
> > >  include/linux/bpf.h            |  60 ++++++++-
> > >  include/linux/bpf_verifier.h   |  21 +++
> > >  include/uapi/linux/bpf.h       |  30 +++++
> > >  kernel/bpf/helpers.c           |  75 +++++++++++
> > >  kernel/bpf/verifier.c          | 225 ++++++++++++++++++++++++++++++++-
> > >  scripts/bpf_doc.py             |   2 +
> > >  tools/include/uapi/linux/bpf.h |  30 +++++
> > >  7 files changed, 440 insertions(+), 3 deletions(-)
> > >
> >
> > [...]
> >
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index 4565684839f1..16b7ea54a7e0 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -187,6 +187,11 @@ struct bpf_verifier_stack_elem {
> > >                                           POISON_POINTER_DELTA))
> > >  #define BPF_MAP_PTR(X)         ((struct bpf_map *)((X) & ~BPF_MAP_PTR_UNPRIV))
> > >
> > > +/* forward declarations */
> >
> > it's kind of obvious from C syntax, this comment doesn't really add value
> >
> > > +static bool arg_type_is_mem_size(enum bpf_arg_type type);
> > > +static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx);
> > > +static int release_reference(struct bpf_verifier_env *env, int ref_obj_id);
> > > +
> >
> > [...]
> >
> > >  static struct bpf_func_state *func(struct bpf_verifier_env *env,
> > >                                    const struct bpf_reg_state *reg)
> > >  {
> > > @@ -646,6 +672,134 @@ static void mark_verifier_state_scratched(struct bpf_verifier_env *env)
> > >         env->scratched_stack_slots = ~0ULL;
> > >  }
> > >
> > > +#define DYNPTR_TYPE_FLAG_MASK          DYNPTR_TYPE_MALLOC
> >
> > can this be put near where DYNPTR_TYPE_MALLOC is defined? It's quite
> > easy to forget to update this if it's somewhere far away
> Sounds great!
> >
> > > +
> > > +static int arg_to_dynptr_type(enum bpf_arg_type arg_type)
> > > +{
> > > +       switch (arg_type & DYNPTR_TYPE_FLAG_MASK) {
> > > +       case DYNPTR_TYPE_MALLOC:
> > > +               return BPF_DYNPTR_TYPE_MALLOC;
> > > +       default:
> > > +               return BPF_DYNPTR_TYPE_INVALID;
> > > +       }
> > > +}
> > > +
> >
> > [...]
> >
> > > +static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> > > +{
> > > +       struct bpf_func_state *state = func(env, reg);
> > > +       int spi, i;
> > > +
> > > +       spi = get_spi(reg->off);
> > > +
> > > +       if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS))
> > > +               return -EINVAL;
> > > +
> > > +       for (i = 0; i < BPF_REG_SIZE; i++) {
> > > +               state->stack[spi].slot_type[i] = STACK_INVALID;
> > > +               state->stack[spi - 1].slot_type[i] = STACK_INVALID;
> > > +       }
> > > +
> > > +       /* Invalidate any slices associated with this dynptr */
> > > +       if (dynptr_type_refcounted(state->stack[spi].spilled_ptr.dynptr.type)) {
> > > +               release_reference(env, state->stack[spi].spilled_ptr.id);
> > > +               state->stack[spi].spilled_ptr.id = 0;
> > > +               state->stack[spi - 1].spilled_ptr.id = 0;
> > > +       }
> > > +
> > > +       state->stack[spi].spilled_ptr.dynptr.type = 0;
> > > +       state->stack[spi - 1].spilled_ptr.dynptr.type = 0;
> > > +
> > > +       state->stack[spi].spilled_ptr.dynptr.first_slot = false;
> >
> > nit: given you marked slots as STACK_INVALID, we shouldn't even look
> > into spilled_ptr.dynptr, so this zeroing out isn't necessary, right?
> My concern was that if the stack slot gets reused and we don't zero
> this out, then this would lead to some erroneous state. Do you think
> this is a valid concern or an unnecessary concern?

It depends whether we have such assumptions elsewhere right now. I
hope not, but if we do, then we might better zero-out when we
initialize the slot. In any case, memset() sounds like a more robust
solution here.

> >
> > > +
> > > +       return 0;
> > > +}
> > > +
> > > +static bool is_dynptr_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> > > +{
> > > +       struct bpf_func_state *state = func(env, reg);
> > > +       int spi = get_spi(reg->off);
> > > +       int i;
> > > +
> > > +       if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS))
> > > +               return true;
> >
> > hm... if spi bounds are invalid, shouldn't that be an error?...
> The spi bounds can be invalid if the uninitialized dynptr variable
> hasn't yet been allocated on the stack. We do the stack allocation for
> it in check_helper_call() after we check all the args (we call this
> is_dynptr_reg_valid_uninit function when we check the args).
> >
> > > +
> > > +       for (i = 0; i < BPF_REG_SIZE; i++) {
> > > +               if (state->stack[spi].slot_type[i] == STACK_DYNPTR ||
> > > +                   state->stack[spi - 1].slot_type[i] == STACK_DYNPTR)
> > > +                       return false;
> > > +       }
> > > +
> > > +       return true;
> > > +}
> > > +
> > > +static bool is_dynptr_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> > > +                                    enum bpf_arg_type arg_type)
> > > +{
> > > +       struct bpf_func_state *state = func(env, reg);
> > > +       int spi = get_spi(reg->off);
> > > +       int i;
> > > +
> > > +       if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS) ||
> > > +           !state->stack[spi].spilled_ptr.dynptr.first_slot)
> > > +               return false;
> > > +
> > > +       for (i = 0; i < BPF_REG_SIZE; i++) {
> > > +           if (state->stack[spi].slot_type[i] != STACK_DYNPTR ||
> > > +               state->stack[spi - 1].slot_type[i] != STACK_DYNPTR)
> > > +                   return false;
> > > +       }
> >
> > minor, but seems like it's pretty common operation to set or check all
> > those "microslots" to STACK_DYNPTR, would two small helpers be useful
> > for this (e.g., is_stack_dynptr() and set_stack_dynptr() or something
> > along those lines)?
> I think having is_stack_dynptr() might be misleading because
> !is_stack_dynptr() does not mean the stack slots are not dynptr.
> is_stack_dynptr() is true only if both stack slots are marked as
> dynptr, !is_stack_dynptr() is therefore true if one or both slots are
> not marked as dynptr, but really the stack slots are truly not dynptr
> only if both slots are not marked as dynptr. (for example, if spi is
> STACK_DYNPTR and spi - 1 is not, !is_stack_dynptr is true, but really
> it is not a stack dynptr only if both slots are not marked as dynptr).
>
> looking through the file, I think there is currently only one place
> where we set the stack dynptr (in mark_stack_slots_dynptr) and test if
> it is a stack dynptr (in is_dynptr_reg_valid_init), so maybe just
> leaving this as is is okay? But I'm also happy to move these to
> smaller helpers if you don't think the !is_stack_dynptr() is too
> misleading/confusing.

it's fine as is, this was minor thing

> >
> > > +
> > > +       /* ARG_PTR_TO_DYNPTR takes any type of dynptr */
> > > +       if (arg_type == ARG_PTR_TO_DYNPTR)
> > > +               return true;
> > > +
> > > +       return state->stack[spi].spilled_ptr.dynptr.type == arg_to_dynptr_type(arg_type);
> > > +}
> > > +
> >
> > [...]
> >
> > > @@ -5837,6 +6011,35 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
> > >                 bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
> > >
> > >                 err = check_mem_size_reg(env, reg, regno, zero_size_allowed, meta);
> > > +       } else if (arg_type_is_dynptr(arg_type)) {
> > > +               /* Can't pass in a dynptr at a weird offset */
> > > +               if (reg->off % BPF_REG_SIZE) {
> > > +                       verbose(env, "cannot pass in non-zero dynptr offset\n");
> > > +                       return -EINVAL;
> > > +               }
> > > +
> > > +               if (arg_type & MEM_UNINIT)  {
> > > +                       if (!is_dynptr_reg_valid_uninit(env, reg)) {
> > > +                               verbose(env, "Arg #%d dynptr has to be an uninitialized dynptr\n",
> > > +                                       arg + BPF_REG_1);
> > > +                               return -EINVAL;
> > > +                       }
> > > +
> > > +                       meta->uninit_dynptr_regno = arg + BPF_REG_1;
> >
> > do we need a check that meta->uninit_dynptr_regno isn't already set?
> > I.e., prevent two uninit dynptr in a helper?
> I don't think we do because the helper functions only take in one
> uninitialized dynptr at the moment, but adding this check would
> prevent the case where in the future, there are more than 1
> uninitialized dynptr args and this verifier code didn't get modified
> to support that. I will add this in for v4.

That's the point of such checks: to guarantee that currently we don't
have helper definition that would break verifier with unexpected
argument combinations. If or when we extend this "only one out dynptr"
rule, we'll need to update various places in verifier to support that,
including this one. That's expected and good.

> >
> > > +               } else if (!is_dynptr_reg_valid_init(env, reg, arg_type)) {
> > > +                       const char *err_extra = "";
> > > +
> > > +                       switch (arg_type & DYNPTR_TYPE_FLAG_MASK) {
> > > +                       case DYNPTR_TYPE_MALLOC:
> > > +                               err_extra = "malloc ";
> > > +                               break;
> > > +                       default:
> > > +                               break;
> > > +                       }
> > > +                       verbose(env, "Expected an initialized %sdynptr as arg #%d\n",
> > > +                               err_extra, arg + BPF_REG_1);
> > > +                       return -EINVAL;
> > > +               }
> > >         } else if (arg_type_is_alloc_size(arg_type)) {
> > >                 if (!tnum_is_const(reg->var_off)) {
> > >                         verbose(env, "R%d is not a known constant'\n",
> >
> > [...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 3/6] bpf: Dynptr support for ring buffers
  2022-05-06 23:41   ` Andrii Nakryiko
@ 2022-05-09 19:44     ` Joanne Koong
  2022-05-09 20:28       ` Andrii Nakryiko
  0 siblings, 1 reply; 29+ messages in thread
From: Joanne Koong @ 2022-05-09 19:44 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Fri, May 6, 2022 at 4:41 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Apr 28, 2022 at 2:12 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > Currently, our only way of writing dynamically-sized data into a ring
> > buffer is through bpf_ringbuf_output but this incurs an extra memcpy
> > cost. bpf_ringbuf_reserve + bpf_ringbuf_commit avoids this extra
> > memcpy, but it can only safely support reservation sizes that are
> > statically known since the verifier cannot guarantee that the bpf
> > program won’t access memory outside the reserved space.
> >
> > The bpf_dynptr abstraction allows for dynamically-sized ring buffer
> > reservations without the extra memcpy.
> >
> > There are 3 new APIs:
> >
> > long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr);
> > void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags);
> > void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags);
> >
> > These closely follow the functionalities of the original ringbuf APIs.
> > For example, all ringbuffer dynptrs that have been reserved must be
> > either submitted or discarded before the program exits.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
>
> Looks great! Modulo those four underscores, they are super confusing...
>
> >  include/linux/bpf.h            | 10 ++++-
> >  include/uapi/linux/bpf.h       | 35 +++++++++++++++++
> >  kernel/bpf/helpers.c           |  6 +++
> >  kernel/bpf/ringbuf.c           | 71 ++++++++++++++++++++++++++++++++++
> >  kernel/bpf/verifier.c          | 18 +++++++--
> >  tools/include/uapi/linux/bpf.h | 35 +++++++++++++++++
> >  6 files changed, 171 insertions(+), 4 deletions(-)
> >
[...]
> >
>
> [...]
>
> > + *
> > + * void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags)
> > + *     Description
> > + *             Discard reserved ring buffer sample through the dynptr
> > + *             interface. This is a no-op if the dynptr is invalid/null.
> > + *
> > + *             For more information on *flags*, please see
> > + *             'bpf_ringbuf_discard'.
> > + *     Return
> > + *             Nothing. Always succeeds.
> >   */
>
> let's also add bpf_dynptr_is_null() (or bpf_dynptr_is_valid(), not
> sure which one is more appropriate, probably just null one), so we can
> check in code whether some reservation was successful without knowing
> bpf_ringbuf_reserve_dynptr()'s return value
I'm planning to add bpf_dynptr_is_null() in the 3rd dynptr patchset
(convenience helpers). Do you prefer that this be part of this
patchset instead? If so, do you think this should be part of the 2nd
patch (aka the one where we set up the infra for dynptrs + implement
malloc-type dynptrs) or this ringbuf patch or its own patch?
>
>
[...]
> [...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 3/6] bpf: Dynptr support for ring buffers
  2022-05-09 19:44     ` Joanne Koong
@ 2022-05-09 20:28       ` Andrii Nakryiko
  2022-05-09 20:35         ` Joanne Koong
  0 siblings, 1 reply; 29+ messages in thread
From: Andrii Nakryiko @ 2022-05-09 20:28 UTC (permalink / raw)
  To: Joanne Koong
  Cc: bpf, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Mon, May 9, 2022 at 12:44 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Fri, May 6, 2022 at 4:41 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Apr 28, 2022 at 2:12 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > >
> > > Currently, our only way of writing dynamically-sized data into a ring
> > > buffer is through bpf_ringbuf_output but this incurs an extra memcpy
> > > cost. bpf_ringbuf_reserve + bpf_ringbuf_commit avoids this extra
> > > memcpy, but it can only safely support reservation sizes that are
> > > statically known since the verifier cannot guarantee that the bpf
> > > program won’t access memory outside the reserved space.
> > >
> > > The bpf_dynptr abstraction allows for dynamically-sized ring buffer
> > > reservations without the extra memcpy.
> > >
> > > There are 3 new APIs:
> > >
> > > long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr);
> > > void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags);
> > > void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags);
> > >
> > > These closely follow the functionalities of the original ringbuf APIs.
> > > For example, all ringbuffer dynptrs that have been reserved must be
> > > either submitted or discarded before the program exits.
> > >
> > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > > ---
> >
> > Looks great! Modulo those four underscores, they are super confusing...
> >
> > >  include/linux/bpf.h            | 10 ++++-
> > >  include/uapi/linux/bpf.h       | 35 +++++++++++++++++
> > >  kernel/bpf/helpers.c           |  6 +++
> > >  kernel/bpf/ringbuf.c           | 71 ++++++++++++++++++++++++++++++++++
> > >  kernel/bpf/verifier.c          | 18 +++++++--
> > >  tools/include/uapi/linux/bpf.h | 35 +++++++++++++++++
> > >  6 files changed, 171 insertions(+), 4 deletions(-)
> > >
> [...]
> > >
> >
> > [...]
> >
> > > + *
> > > + * void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags)
> > > + *     Description
> > > + *             Discard reserved ring buffer sample through the dynptr
> > > + *             interface. This is a no-op if the dynptr is invalid/null.
> > > + *
> > > + *             For more information on *flags*, please see
> > > + *             'bpf_ringbuf_discard'.
> > > + *     Return
> > > + *             Nothing. Always succeeds.
> > >   */
> >
> > let's also add bpf_dynptr_is_null() (or bpf_dynptr_is_valid(), not
> > sure which one is more appropriate, probably just null one), so we can
> > check in code whether some reservation was successful without knowing
> > bpf_ringbuf_reserve_dynptr()'s return value
> I'm planning to add bpf_dynptr_is_null() in the 3rd dynptr patchset
> (convenience helpers). Do you prefer that this be part of this
> patchset instead? If so, do you think this should be part of the 2nd
> patch (aka the one where we set up the infra for dynptrs + implement
> malloc-type dynptrs) or this ringbuf patch or its own patch?

No problem adding it in a follow up patch.

BTW, is it still in the plan to be able to create bpf_dynptr() from
map_value, global variables, etc? I.e., it's a LOCAL dynptr except
memory is not on STACK.

Something like

int k = 123;
struct my_val *v;
struct bpf_dynptr p;

v = bpf_map_lookup_elem(&my_map, &k);
if (!v) return 0;

bpf_dynptr_from_mem(&v->my_data, &p);

/* p points inside my_map's value */

?


> >
> >
> [...]
> > [...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 3/6] bpf: Dynptr support for ring buffers
  2022-05-09 20:28       ` Andrii Nakryiko
@ 2022-05-09 20:35         ` Joanne Koong
  2022-05-09 20:58           ` Andrii Nakryiko
  0 siblings, 1 reply; 29+ messages in thread
From: Joanne Koong @ 2022-05-09 20:35 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Mon, May 9, 2022 at 1:28 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, May 9, 2022 at 12:44 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > On Fri, May 6, 2022 at 4:41 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Thu, Apr 28, 2022 at 2:12 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > >
> > > > Currently, our only way of writing dynamically-sized data into a ring
> > > > buffer is through bpf_ringbuf_output but this incurs an extra memcpy
> > > > cost. bpf_ringbuf_reserve + bpf_ringbuf_commit avoids this extra
> > > > memcpy, but it can only safely support reservation sizes that are
> > > > statically known since the verifier cannot guarantee that the bpf
> > > > program won’t access memory outside the reserved space.
> > > >
> > > > The bpf_dynptr abstraction allows for dynamically-sized ring buffer
> > > > reservations without the extra memcpy.
> > > >
> > > > There are 3 new APIs:
> > > >
> > > > long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr);
> > > > void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags);
> > > > void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags);
> > > >
> > > > These closely follow the functionalities of the original ringbuf APIs.
> > > > For example, all ringbuffer dynptrs that have been reserved must be
> > > > either submitted or discarded before the program exits.
> > > >
> > > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > > > ---
> > >
> > > Looks great! Modulo those four underscores, they are super confusing...
> > >
> > > >  include/linux/bpf.h            | 10 ++++-
> > > >  include/uapi/linux/bpf.h       | 35 +++++++++++++++++
> > > >  kernel/bpf/helpers.c           |  6 +++
> > > >  kernel/bpf/ringbuf.c           | 71 ++++++++++++++++++++++++++++++++++
> > > >  kernel/bpf/verifier.c          | 18 +++++++--
> > > >  tools/include/uapi/linux/bpf.h | 35 +++++++++++++++++
> > > >  6 files changed, 171 insertions(+), 4 deletions(-)
> > > >
> > [...]
> > > >
> > >
> > > [...]
> > >
> > > > + *
> > > > + * void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags)
> > > > + *     Description
> > > > + *             Discard reserved ring buffer sample through the dynptr
> > > > + *             interface. This is a no-op if the dynptr is invalid/null.
> > > > + *
> > > > + *             For more information on *flags*, please see
> > > > + *             'bpf_ringbuf_discard'.
> > > > + *     Return
> > > > + *             Nothing. Always succeeds.
> > > >   */
> > >
> > > let's also add bpf_dynptr_is_null() (or bpf_dynptr_is_valid(), not
> > > sure which one is more appropriate, probably just null one), so we can
> > > check in code whether some reservation was successful without knowing
> > > bpf_ringbuf_reserve_dynptr()'s return value
> > I'm planning to add bpf_dynptr_is_null() in the 3rd dynptr patchset
> > (convenience helpers). Do you prefer that this be part of this
> > patchset instead? If so, do you think this should be part of the 2nd
> > patch (aka the one where we set up the infra for dynptrs + implement
> > malloc-type dynptrs) or this ringbuf patch or its own patch?
>
> No problem adding it in a follow up patch.
>
> BTW, is it still in the plan to be able to create bpf_dynptr() from
> map_value, global variables, etc? I.e., it's a LOCAL dynptr except
> memory is not on STACK.
>
> Something like
>
> int k = 123;
> struct my_val *v;
> struct bpf_dynptr p;
>
> v = bpf_map_lookup_elem(&my_map, &k);
> if (!v) return 0;
>
> bpf_dynptr_from_mem(&v->my_data, &p);
>
> /* p points inside my_map's value */
>
> ?
The plan is to still support some types of local dynptrs (eg dynptr to
ctx skbuf / xdp data). If it would be useful to also have this for
map_value, we can add this as well (the RCU protects against the map
value being freed out from under the dynptr, I believe).
>
>
> > >
> > >
> > [...]
> > > [...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v3 3/6] bpf: Dynptr support for ring buffers
  2022-05-09 20:35         ` Joanne Koong
@ 2022-05-09 20:58           ` Andrii Nakryiko
  0 siblings, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2022-05-09 20:58 UTC (permalink / raw)
  To: Joanne Koong
  Cc: bpf, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Mon, May 9, 2022 at 1:35 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Mon, May 9, 2022 at 1:28 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Mon, May 9, 2022 at 12:44 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > >
> > > On Fri, May 6, 2022 at 4:41 PM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > On Thu, Apr 28, 2022 at 2:12 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > >
> > > > > Currently, our only way of writing dynamically-sized data into a ring
> > > > > buffer is through bpf_ringbuf_output but this incurs an extra memcpy
> > > > > cost. bpf_ringbuf_reserve + bpf_ringbuf_commit avoids this extra
> > > > > memcpy, but it can only safely support reservation sizes that are
> > > > > statically known since the verifier cannot guarantee that the bpf
> > > > > program won’t access memory outside the reserved space.
> > > > >
> > > > > The bpf_dynptr abstraction allows for dynamically-sized ring buffer
> > > > > reservations without the extra memcpy.
> > > > >
> > > > > There are 3 new APIs:
> > > > >
> > > > > long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr);
> > > > > void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags);
> > > > > void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags);
> > > > >
> > > > > These closely follow the functionalities of the original ringbuf APIs.
> > > > > For example, all ringbuffer dynptrs that have been reserved must be
> > > > > either submitted or discarded before the program exits.
> > > > >
> > > > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > > > > ---
> > > >
> > > > Looks great! Modulo those four underscores, they are super confusing...
> > > >
> > > > >  include/linux/bpf.h            | 10 ++++-
> > > > >  include/uapi/linux/bpf.h       | 35 +++++++++++++++++
> > > > >  kernel/bpf/helpers.c           |  6 +++
> > > > >  kernel/bpf/ringbuf.c           | 71 ++++++++++++++++++++++++++++++++++
> > > > >  kernel/bpf/verifier.c          | 18 +++++++--
> > > > >  tools/include/uapi/linux/bpf.h | 35 +++++++++++++++++
> > > > >  6 files changed, 171 insertions(+), 4 deletions(-)
> > > > >
> > > [...]
> > > > >
> > > >
> > > > [...]
> > > >
> > > > > + *
> > > > > + * void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags)
> > > > > + *     Description
> > > > > + *             Discard reserved ring buffer sample through the dynptr
> > > > > + *             interface. This is a no-op if the dynptr is invalid/null.
> > > > > + *
> > > > > + *             For more information on *flags*, please see
> > > > > + *             'bpf_ringbuf_discard'.
> > > > > + *     Return
> > > > > + *             Nothing. Always succeeds.
> > > > >   */
> > > >
> > > > let's also add bpf_dynptr_is_null() (or bpf_dynptr_is_valid(), not
> > > > sure which one is more appropriate, probably just null one), so we can
> > > > check in code whether some reservation was successful without knowing
> > > > bpf_ringbuf_reserve_dynptr()'s return value
> > > I'm planning to add bpf_dynptr_is_null() in the 3rd dynptr patchset
> > > (convenience helpers). Do you prefer that this be part of this
> > > patchset instead? If so, do you think this should be part of the 2nd
> > > patch (aka the one where we set up the infra for dynptrs + implement
> > > malloc-type dynptrs) or this ringbuf patch or its own patch?
> >
> > No problem adding it in a follow up patch.
> >
> > BTW, is it still in the plan to be able to create bpf_dynptr() from
> > map_value, global variables, etc? I.e., it's a LOCAL dynptr except
> > memory is not on STACK.
> >
> > Something like
> >
> > int k = 123;
> > struct my_val *v;
> > struct bpf_dynptr p;
> >
> > v = bpf_map_lookup_elem(&my_map, &k);
> > if (!v) return 0;
> >
> > bpf_dynptr_from_mem(&v->my_data, &p);
> >
> > /* p points inside my_map's value */
> >
> > ?
> The plan is to still support some types of local dynptrs (eg dynptr to
> ctx skbuf / xdp data). If it would be useful to also have this for
> map_value, we can add this as well (the RCU protects against the map
> value being freed out from under the dynptr, I believe).

Yep, I think it's useful and should be pretty straightforward (there
are no self-referential issues as with PTR_TO_STACK).

> >
> >
> > > >
> > > >
> > > [...]
> > > > [...]

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2022-05-09 20:59 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-28 21:10 [PATCH bpf-next v3 0/6] Dynamic pointers Joanne Koong
2022-04-28 21:10 ` [PATCH bpf-next v3 1/6] bpf: Add MEM_UNINIT as a bpf_type_flag Joanne Koong
2022-05-06 15:07   ` David Vernet
     [not found]     ` <CAJnrk1Yc7G9BamfcNDGXvhMbHcrebROxN97GPPNENJ9_vGF5XA@mail.gmail.com>
2022-05-06 20:32       ` David Vernet
2022-05-06 22:46         ` Andrii Nakryiko
2022-05-07  1:48           ` David Vernet
2022-05-09 17:10           ` Joanne Koong
2022-05-06 22:41   ` Andrii Nakryiko
2022-04-28 21:10 ` [PATCH bpf-next v3 2/6] bpf: Add verifier support for dynptrs and implement malloc dynptrs Joanne Koong
2022-05-06 23:30   ` Andrii Nakryiko
2022-05-09 18:58     ` Joanne Koong
2022-05-09 19:26       ` Andrii Nakryiko
2022-04-28 21:10 ` [PATCH bpf-next v3 3/6] bpf: Dynptr support for ring buffers Joanne Koong
2022-05-06 23:41   ` Andrii Nakryiko
2022-05-09 19:44     ` Joanne Koong
2022-05-09 20:28       ` Andrii Nakryiko
2022-05-09 20:35         ` Joanne Koong
2022-05-09 20:58           ` Andrii Nakryiko
2022-04-28 21:10 ` [PATCH bpf-next v3 4/6] bpf: Add bpf_dynptr_read and bpf_dynptr_write Joanne Koong
2022-05-06 23:48   ` Andrii Nakryiko
2022-05-09 17:15     ` Joanne Koong
2022-04-28 21:10 ` [PATCH bpf-next v3 5/6] bpf: Add dynptr data slices Joanne Koong
2022-05-06 23:57   ` Andrii Nakryiko
2022-05-09 17:21     ` Joanne Koong
2022-05-09 18:29       ` Andrii Nakryiko
2022-04-28 21:10 ` [PATCH bpf-next v3 6/6] bpf: Dynptr tests Joanne Koong
2022-05-07  0:09   ` Andrii Nakryiko
2022-05-06 22:35 ` [PATCH bpf-next v3 0/6] Dynamic pointers Andrii Nakryiko
2022-05-09 17:26   ` Joanne Koong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.