bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v1 0/7] Dynamic pointers
@ 2022-04-02  1:58 Joanne Koong
  2022-04-02  1:58 ` [PATCH bpf-next v1 1/7] bpf: Add MEM_UNINIT as a bpf_type_flag Joanne Koong
                   ` (7 more replies)
  0 siblings, 8 replies; 32+ messages in thread
From: Joanne Koong @ 2022-04-02  1:58 UTC (permalink / raw)
  To: bpf; +Cc: andrii, ast, daniel, Joanne Koong

From: Joanne Koong <joannelkoong@gmail.com>

This patchset implements the basics of dynamic pointers in bpf.

A dynamic pointer (struct bpf_dynptr) is a pointer that stores extra metadata
alongside the address it points to. This abstraction is useful in bpf, given
that every memory access in a bpf program must be safe. The verifier and bpf
helper functions can use the metadata to enforce safety guarantees for things 
such as dynamically sized strings and kernel heap allocations.

From the program side, the bpf_dynptr is an opaque struct and the verifier
will enforce that its contents are never written to by the program.
It can only be written to through specific bpf helper functions.

There are several uses cases for dynamic pointers in bpf programs. A list of
some are: dynamically sized ringbuf reservations without any extra memcpys,
dynamic string parsing and memory comparisons, dynamic memory allocations that
can be persisted in a map, and dynamic parsing of sk_buff and xdp_md packet
data.

At a high-level, the patches are as follows:
1/7 - Adds MEM_UNINIT as a bpf_type_flag
2/7 - Adds MEM_RELEASE as a bpf_type_flag
3/7 - Adds bpf_dynptr_from_mem, bpf_malloc, and bpf_free
4/7 - Adds bpf_dynptr_read and bpf_dynptr_write
5/7 - Adds dynptr data slices (ptr to underlying dynptr memory)
6/7 - Adds dynptr support for ring buffers
7/7 - Tests to check that verifier rejects certain fail cases and passes
certain success cases

This is the first dynptr patchset in a larger series. The next series of
patches will add persisting dynamic memory allocations in maps, parsing packet
data through dynptrs, dynptrs to referenced objects, convenience helpers for
using dynptrs as iterators, and more helper functions for interacting with
strings and memory dynamically.

Joanne Koong (7):
  bpf: Add MEM_UNINIT as a bpf_type_flag
  bpf: Add MEM_RELEASE as a bpf_type_flag
  bpf: Add bpf_dynptr_from_mem, bpf_malloc, bpf_free
  bpf: Add bpf_dynptr_read and bpf_dynptr_write
  bpf: Add dynptr data slices
  bpf: Dynptr support for ring buffers
  bpf: Dynptr tests

 include/linux/bpf.h                           | 107 +++-
 include/linux/bpf_verifier.h                  |  23 +-
 include/uapi/linux/bpf.h                      | 100 ++++
 kernel/bpf/bpf_lsm.c                          |   4 +-
 kernel/bpf/btf.c                              |   3 +-
 kernel/bpf/cgroup.c                           |   4 +-
 kernel/bpf/helpers.c                          | 190 ++++++-
 kernel/bpf/ringbuf.c                          |  75 ++-
 kernel/bpf/stackmap.c                         |   6 +-
 kernel/bpf/verifier.c                         | 406 ++++++++++++--
 kernel/trace/bpf_trace.c                      |  20 +-
 net/core/filter.c                             |  28 +-
 scripts/bpf_doc.py                            |   2 +
 tools/include/uapi/linux/bpf.h                | 100 ++++
 .../testing/selftests/bpf/prog_tests/dynptr.c | 303 ++++++++++
 .../testing/selftests/bpf/progs/dynptr_fail.c | 527 ++++++++++++++++++
 .../selftests/bpf/progs/dynptr_success.c      | 147 +++++
 17 files changed, 1955 insertions(+), 90 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/dynptr.c
 create mode 100644 tools/testing/selftests/bpf/progs/dynptr_fail.c
 create mode 100644 tools/testing/selftests/bpf/progs/dynptr_success.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH bpf-next v1 1/7] bpf: Add MEM_UNINIT as a bpf_type_flag
  2022-04-02  1:58 [PATCH bpf-next v1 0/7] Dynamic pointers Joanne Koong
@ 2022-04-02  1:58 ` Joanne Koong
  2022-04-06 18:33   ` Andrii Nakryiko
  2022-04-02  1:58 ` [PATCH bpf-next v1 2/7] bpf: Add MEM_RELEASE " Joanne Koong
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Joanne Koong @ 2022-04-02  1:58 UTC (permalink / raw)
  To: bpf; +Cc: andrii, ast, daniel, Joanne Koong

From: Joanne Koong <joannelkoong@gmail.com>

Instead of having uninitialized versions of arguments as separate
bpf_arg_types (eg ARG_PTR_TO_UNINIT_MEM as the uninitialized version
of ARG_PTR_TO_MEM), we can instead use MEM_UNINIT as a bpf_type_flag
modifier to denote that the argument is uninitialized.

Doing so cleans up some of the logic in the verifier. We no longer
need to do two checks against an argument type (eg "if
(base_type(arg_type) == ARG_PTR_TO_MEM || base_type(arg_type) ==
ARG_PTR_TO_UNINIT_MEM)"), since uninitialized and initialized
versions of the same argument type will now share the same base type.

In the near future, MEM_UNINIT will be used by dynptr helper functions
as well.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/linux/bpf.h      | 19 +++++++++++--------
 kernel/bpf/bpf_lsm.c     |  4 ++--
 kernel/bpf/cgroup.c      |  4 ++--
 kernel/bpf/helpers.c     | 12 ++++++------
 kernel/bpf/stackmap.c    |  6 +++---
 kernel/bpf/verifier.c    | 25 ++++++++++---------------
 kernel/trace/bpf_trace.c | 20 ++++++++++----------
 net/core/filter.c        | 26 +++++++++++++-------------
 8 files changed, 57 insertions(+), 59 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index bdb5298735ce..6f2558da9d4a 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -342,7 +342,9 @@ enum bpf_type_flag {
 	 */
 	MEM_PERCPU		= BIT(4 + BPF_BASE_TYPE_BITS),
 
-	__BPF_TYPE_LAST_FLAG	= MEM_PERCPU,
+	MEM_UNINIT		= BIT(5 + BPF_BASE_TYPE_BITS),
+
+	__BPF_TYPE_LAST_FLAG	= MEM_UNINIT,
 };
 
 /* Max number of base types. */
@@ -361,16 +363,11 @@ enum bpf_arg_type {
 	ARG_CONST_MAP_PTR,	/* const argument used as pointer to bpf_map */
 	ARG_PTR_TO_MAP_KEY,	/* pointer to stack used as map key */
 	ARG_PTR_TO_MAP_VALUE,	/* pointer to stack used as map value */
-	ARG_PTR_TO_UNINIT_MAP_VALUE,	/* pointer to valid memory used to store a map value */
 
-	/* the following constraints used to prototype bpf_memcmp() and other
-	 * functions that access data on eBPF program stack
+	/* Used to prototype bpf_memcmp() and other functions that access data
+	 * on eBPF program stack
 	 */
 	ARG_PTR_TO_MEM,		/* pointer to valid memory (stack, packet, map value) */
-	ARG_PTR_TO_UNINIT_MEM,	/* pointer to memory does not need to be initialized,
-				 * helper function must fill all bytes or clear
-				 * them in error case.
-				 */
 
 	ARG_CONST_SIZE,		/* number of bytes accessed from memory */
 	ARG_CONST_SIZE_OR_ZERO,	/* number of bytes accessed from memory or 0 */
@@ -400,6 +397,12 @@ enum bpf_arg_type {
 	ARG_PTR_TO_SOCKET_OR_NULL	= PTR_MAYBE_NULL | ARG_PTR_TO_SOCKET,
 	ARG_PTR_TO_ALLOC_MEM_OR_NULL	= PTR_MAYBE_NULL | ARG_PTR_TO_ALLOC_MEM,
 	ARG_PTR_TO_STACK_OR_NULL	= PTR_MAYBE_NULL | ARG_PTR_TO_STACK,
+	/* pointer to valid memory used to store a map value */
+	ARG_PTR_TO_MAP_VALUE_UNINIT	= MEM_UNINIT | ARG_PTR_TO_MAP_VALUE,
+	/* pointer to memory does not need to be initialized, helper function must fill
+	 * all bytes or clear them in error case.
+	 */
+	ARG_PTR_TO_MEM_UNINIT		= MEM_UNINIT | ARG_PTR_TO_MEM,
 
 	/* This must be the last entry. Its purpose is to ensure the enum is
 	 * wide enough to hold the higher bits reserved for bpf_type_flag.
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index 064eccba641d..11ebadc82e8d 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -94,7 +94,7 @@ static const struct bpf_func_proto bpf_ima_inode_hash_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_BTF_ID,
 	.arg1_btf_id	= &bpf_ima_inode_hash_btf_ids[0],
-	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg3_type	= ARG_CONST_SIZE,
 	.allowed	= bpf_ima_inode_hash_allowed,
 };
@@ -112,7 +112,7 @@ static const struct bpf_func_proto bpf_ima_file_hash_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_BTF_ID,
 	.arg1_btf_id	= &bpf_ima_file_hash_btf_ids[0],
-	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg3_type	= ARG_CONST_SIZE,
 	.allowed	= bpf_ima_inode_hash_allowed,
 };
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 128028efda64..4947e3324480 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1724,7 +1724,7 @@ static const struct bpf_func_proto bpf_sysctl_get_current_value_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg3_type	= ARG_CONST_SIZE,
 };
 
@@ -1744,7 +1744,7 @@ static const struct bpf_func_proto bpf_sysctl_get_new_value_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg3_type	= ARG_CONST_SIZE,
 };
 
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 315053ef6a75..cc6d480c5c23 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -103,7 +103,7 @@ const struct bpf_func_proto bpf_map_pop_elem_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_CONST_MAP_PTR,
-	.arg2_type	= ARG_PTR_TO_UNINIT_MAP_VALUE,
+	.arg2_type	= ARG_PTR_TO_MAP_VALUE_UNINIT,
 };
 
 BPF_CALL_2(bpf_map_peek_elem, struct bpf_map *, map, void *, value)
@@ -116,7 +116,7 @@ const struct bpf_func_proto bpf_map_peek_elem_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_CONST_MAP_PTR,
-	.arg2_type	= ARG_PTR_TO_UNINIT_MAP_VALUE,
+	.arg2_type	= ARG_PTR_TO_MAP_VALUE_UNINIT,
 };
 
 const struct bpf_func_proto bpf_get_prandom_u32_proto = {
@@ -237,7 +237,7 @@ const struct bpf_func_proto bpf_get_current_comm_proto = {
 	.func		= bpf_get_current_comm,
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg1_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg2_type	= ARG_CONST_SIZE,
 };
 
@@ -616,7 +616,7 @@ const struct bpf_func_proto bpf_get_ns_current_pid_tgid_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_ANYTHING,
 	.arg2_type	= ARG_ANYTHING,
-	.arg3_type      = ARG_PTR_TO_UNINIT_MEM,
+	.arg3_type      = ARG_PTR_TO_MEM_UNINIT,
 	.arg4_type      = ARG_CONST_SIZE,
 };
 
@@ -663,7 +663,7 @@ const struct bpf_func_proto bpf_copy_from_user_proto = {
 	.func		= bpf_copy_from_user,
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg1_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
 	.arg3_type	= ARG_ANYTHING,
 };
@@ -693,7 +693,7 @@ const struct bpf_func_proto bpf_copy_from_user_task_proto = {
 	.func		= bpf_copy_from_user_task,
 	.gpl_only	= true,
 	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg1_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
 	.arg3_type	= ARG_ANYTHING,
 	.arg4_type	= ARG_PTR_TO_BTF_ID,
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 34725bfa1e97..24fdda340008 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -465,7 +465,7 @@ const struct bpf_func_proto bpf_get_stack_proto = {
 	.gpl_only	= true,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg3_type	= ARG_CONST_SIZE_OR_ZERO,
 	.arg4_type	= ARG_ANYTHING,
 };
@@ -493,7 +493,7 @@ const struct bpf_func_proto bpf_get_task_stack_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_BTF_ID,
 	.arg1_btf_id	= &btf_tracing_ids[BTF_TRACING_TYPE_TASK],
-	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg3_type	= ARG_CONST_SIZE_OR_ZERO,
 	.arg4_type	= ARG_ANYTHING,
 };
@@ -556,7 +556,7 @@ const struct bpf_func_proto bpf_get_stack_proto_pe = {
 	.gpl_only	= true,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg3_type	= ARG_CONST_SIZE_OR_ZERO,
 	.arg4_type	= ARG_ANYTHING,
 };
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d175b70067b3..90280d5666be 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5136,8 +5136,7 @@ static int process_timer_func(struct bpf_verifier_env *env, int regno,
 
 static bool arg_type_is_mem_ptr(enum bpf_arg_type type)
 {
-	return base_type(type) == ARG_PTR_TO_MEM ||
-	       base_type(type) == ARG_PTR_TO_UNINIT_MEM;
+	return base_type(type) == ARG_PTR_TO_MEM;
 }
 
 static bool arg_type_is_mem_size(enum bpf_arg_type type)
@@ -5273,7 +5272,6 @@ static const struct bpf_reg_types timer_types = { .types = { PTR_TO_MAP_VALUE }
 static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
 	[ARG_PTR_TO_MAP_KEY]		= &map_key_value_types,
 	[ARG_PTR_TO_MAP_VALUE]		= &map_key_value_types,
-	[ARG_PTR_TO_UNINIT_MAP_VALUE]	= &map_key_value_types,
 	[ARG_CONST_SIZE]		= &scalar_types,
 	[ARG_CONST_SIZE_OR_ZERO]	= &scalar_types,
 	[ARG_CONST_ALLOC_SIZE_OR_ZERO]	= &scalar_types,
@@ -5287,7 +5285,6 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
 	[ARG_PTR_TO_BTF_ID]		= &btf_ptr_types,
 	[ARG_PTR_TO_SPIN_LOCK]		= &spin_lock_types,
 	[ARG_PTR_TO_MEM]		= &mem_types,
-	[ARG_PTR_TO_UNINIT_MEM]		= &mem_types,
 	[ARG_PTR_TO_ALLOC_MEM]		= &alloc_mem_types,
 	[ARG_PTR_TO_INT]		= &int_ptr_types,
 	[ARG_PTR_TO_LONG]		= &int_ptr_types,
@@ -5451,8 +5448,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		return -EACCES;
 	}
 
-	if (base_type(arg_type) == ARG_PTR_TO_MAP_VALUE ||
-	    base_type(arg_type) == ARG_PTR_TO_UNINIT_MAP_VALUE) {
+	if (base_type(arg_type) == ARG_PTR_TO_MAP_VALUE) {
 		err = resolve_map_arg_type(env, meta, &arg_type);
 		if (err)
 			return err;
@@ -5528,8 +5524,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		err = check_helper_mem_access(env, regno,
 					      meta->map_ptr->key_size, false,
 					      NULL);
-	} else if (base_type(arg_type) == ARG_PTR_TO_MAP_VALUE ||
-		   base_type(arg_type) == ARG_PTR_TO_UNINIT_MAP_VALUE) {
+	} else if (base_type(arg_type) == ARG_PTR_TO_MAP_VALUE) {
 		if (type_may_be_null(arg_type) && register_is_null(reg))
 			return 0;
 
@@ -5541,7 +5536,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			verbose(env, "invalid map_ptr to access map->value\n");
 			return -EACCES;
 		}
-		meta->raw_mode = (arg_type == ARG_PTR_TO_UNINIT_MAP_VALUE);
+		meta->raw_mode = (arg_type == ARG_PTR_TO_MAP_VALUE_UNINIT);
 		err = check_helper_mem_access(env, regno,
 					      meta->map_ptr->value_size, false,
 					      meta);
@@ -5572,7 +5567,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		/* The access to this pointer is only checked when we hit the
 		 * next is_mem_size argument below.
 		 */
-		meta->raw_mode = (arg_type == ARG_PTR_TO_UNINIT_MEM);
+		meta->raw_mode = (arg_type == ARG_PTR_TO_MEM_UNINIT);
 	} else if (arg_type_is_mem_size(arg_type)) {
 		bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
 
@@ -5894,15 +5889,15 @@ static bool check_raw_mode_ok(const struct bpf_func_proto *fn)
 {
 	int count = 0;
 
-	if (fn->arg1_type == ARG_PTR_TO_UNINIT_MEM)
+	if (fn->arg1_type == ARG_PTR_TO_MEM_UNINIT)
 		count++;
-	if (fn->arg2_type == ARG_PTR_TO_UNINIT_MEM)
+	if (fn->arg2_type == ARG_PTR_TO_MEM_UNINIT)
 		count++;
-	if (fn->arg3_type == ARG_PTR_TO_UNINIT_MEM)
+	if (fn->arg3_type == ARG_PTR_TO_MEM_UNINIT)
 		count++;
-	if (fn->arg4_type == ARG_PTR_TO_UNINIT_MEM)
+	if (fn->arg4_type == ARG_PTR_TO_MEM_UNINIT)
 		count++;
-	if (fn->arg5_type == ARG_PTR_TO_UNINIT_MEM)
+	if (fn->arg5_type == ARG_PTR_TO_MEM_UNINIT)
 		count++;
 
 	/* We only support one arg being in raw mode at the moment,
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 7fa2ebc07f60..33e1e824a05a 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -175,7 +175,7 @@ const struct bpf_func_proto bpf_probe_read_user_proto = {
 	.func		= bpf_probe_read_user,
 	.gpl_only	= true,
 	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg1_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
 	.arg3_type	= ARG_ANYTHING,
 };
@@ -212,7 +212,7 @@ const struct bpf_func_proto bpf_probe_read_user_str_proto = {
 	.func		= bpf_probe_read_user_str,
 	.gpl_only	= true,
 	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg1_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
 	.arg3_type	= ARG_ANYTHING,
 };
@@ -238,7 +238,7 @@ const struct bpf_func_proto bpf_probe_read_kernel_proto = {
 	.func		= bpf_probe_read_kernel,
 	.gpl_only	= true,
 	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg1_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
 	.arg3_type	= ARG_ANYTHING,
 };
@@ -273,7 +273,7 @@ const struct bpf_func_proto bpf_probe_read_kernel_str_proto = {
 	.func		= bpf_probe_read_kernel_str,
 	.gpl_only	= true,
 	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg1_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
 	.arg3_type	= ARG_ANYTHING,
 };
@@ -293,7 +293,7 @@ static const struct bpf_func_proto bpf_probe_read_compat_proto = {
 	.func		= bpf_probe_read_compat,
 	.gpl_only	= true,
 	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg1_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
 	.arg3_type	= ARG_ANYTHING,
 };
@@ -312,7 +312,7 @@ static const struct bpf_func_proto bpf_probe_read_compat_str_proto = {
 	.func		= bpf_probe_read_compat_str,
 	.gpl_only	= true,
 	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg1_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
 	.arg3_type	= ARG_ANYTHING,
 };
@@ -610,7 +610,7 @@ static const struct bpf_func_proto bpf_perf_event_read_value_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_CONST_MAP_PTR,
 	.arg2_type	= ARG_ANYTHING,
-	.arg3_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg3_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg4_type	= ARG_CONST_SIZE,
 };
 
@@ -1112,7 +1112,7 @@ static const struct bpf_func_proto bpf_get_branch_snapshot_proto = {
 	.func		= bpf_get_branch_snapshot,
 	.gpl_only	= true,
 	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg1_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
 };
 
@@ -1406,7 +1406,7 @@ static const struct bpf_func_proto bpf_get_stack_proto_tp = {
 	.gpl_only	= true,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg3_type	= ARG_CONST_SIZE_OR_ZERO,
 	.arg4_type	= ARG_ANYTHING,
 };
@@ -1473,7 +1473,7 @@ static const struct bpf_func_proto bpf_perf_prog_read_value_proto = {
          .gpl_only       = true,
          .ret_type       = RET_INTEGER,
          .arg1_type      = ARG_PTR_TO_CTX,
-         .arg2_type      = ARG_PTR_TO_UNINIT_MEM,
+	 .arg2_type      = ARG_PTR_TO_MEM_UNINIT,
          .arg3_type      = ARG_CONST_SIZE,
 };
 
diff --git a/net/core/filter.c b/net/core/filter.c
index a7044e98765e..9aafec3a09ed 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1743,7 +1743,7 @@ static const struct bpf_func_proto bpf_skb_load_bytes_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_ANYTHING,
-	.arg3_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg3_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg4_type	= ARG_CONST_SIZE,
 };
 
@@ -1777,7 +1777,7 @@ static const struct bpf_func_proto bpf_flow_dissector_load_bytes_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_ANYTHING,
-	.arg3_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg3_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg4_type	= ARG_CONST_SIZE,
 };
 
@@ -1821,7 +1821,7 @@ static const struct bpf_func_proto bpf_skb_load_bytes_relative_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_ANYTHING,
-	.arg3_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg3_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg4_type	= ARG_CONST_SIZE,
 	.arg5_type	= ARG_ANYTHING,
 };
@@ -3943,7 +3943,7 @@ static const struct bpf_func_proto bpf_xdp_load_bytes_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_ANYTHING,
-	.arg3_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg3_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg4_type	= ARG_CONST_SIZE,
 };
 
@@ -3970,7 +3970,7 @@ static const struct bpf_func_proto bpf_xdp_store_bytes_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_ANYTHING,
-	.arg3_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg3_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg4_type	= ARG_CONST_SIZE,
 };
 
@@ -4544,7 +4544,7 @@ static const struct bpf_func_proto bpf_skb_get_tunnel_key_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg3_type	= ARG_CONST_SIZE,
 	.arg4_type	= ARG_ANYTHING,
 };
@@ -4579,7 +4579,7 @@ static const struct bpf_func_proto bpf_skb_get_tunnel_opt_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg3_type	= ARG_CONST_SIZE,
 };
 
@@ -5386,7 +5386,7 @@ const struct bpf_func_proto bpf_sk_getsockopt_proto = {
 	.arg1_type	= ARG_PTR_TO_BTF_ID_SOCK_COMMON,
 	.arg2_type	= ARG_ANYTHING,
 	.arg3_type	= ARG_ANYTHING,
-	.arg4_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg4_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg5_type	= ARG_CONST_SIZE,
 };
 
@@ -5420,7 +5420,7 @@ static const struct bpf_func_proto bpf_sock_addr_getsockopt_proto = {
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_ANYTHING,
 	.arg3_type	= ARG_ANYTHING,
-	.arg4_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg4_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg5_type	= ARG_CONST_SIZE,
 };
 
@@ -5544,7 +5544,7 @@ static const struct bpf_func_proto bpf_sock_ops_getsockopt_proto = {
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_ANYTHING,
 	.arg3_type	= ARG_ANYTHING,
-	.arg4_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg4_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg5_type	= ARG_CONST_SIZE,
 };
 
@@ -5656,7 +5656,7 @@ static const struct bpf_func_proto bpf_skb_get_xfrm_state_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_ANYTHING,
-	.arg3_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg3_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg4_type	= ARG_CONST_SIZE,
 	.arg5_type	= ARG_ANYTHING,
 };
@@ -10741,7 +10741,7 @@ static const struct bpf_func_proto sk_reuseport_load_bytes_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_ANYTHING,
-	.arg3_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg3_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg4_type	= ARG_CONST_SIZE,
 };
 
@@ -10759,7 +10759,7 @@ static const struct bpf_func_proto sk_reuseport_load_bytes_relative_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_ANYTHING,
-	.arg3_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg3_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg4_type	= ARG_CONST_SIZE,
 	.arg5_type	= ARG_ANYTHING,
 };
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next v1 2/7] bpf: Add MEM_RELEASE as a bpf_type_flag
  2022-04-02  1:58 [PATCH bpf-next v1 0/7] Dynamic pointers Joanne Koong
  2022-04-02  1:58 ` [PATCH bpf-next v1 1/7] bpf: Add MEM_UNINIT as a bpf_type_flag Joanne Koong
@ 2022-04-02  1:58 ` Joanne Koong
  2022-04-04  7:34   ` Kumar Kartikeya Dwivedi
  2022-04-06 18:42   ` Andrii Nakryiko
  2022-04-02  1:58 ` [PATCH bpf-next v1 3/7] bpf: Add bpf_dynptr_from_mem, bpf_malloc, bpf_free Joanne Koong
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 32+ messages in thread
From: Joanne Koong @ 2022-04-02  1:58 UTC (permalink / raw)
  To: bpf; +Cc: andrii, ast, daniel, Joanne Koong

From: Joanne Koong <joannelkoong@gmail.com>

Currently, we hardcode in the verifier which functions are release
functions. We have no way of differentiating which argument is the one
to be released (we assume it will always be the first argument).

This patch adds MEM_RELEASE as a bpf_type_flag. This allows us to
determine which argument in the function needs to be released, and
removes having to hardcode a list of release functions into the
verifier.

Please note that currently, we only support one release argument in a
helper function. In the future, if/when we need to support several
release arguments within the function, MEM_RELEASE is necessary
since there needs to be a way of differentiating which arguments are the
release ones.

In the near future, MEM_RELEASE will be used by dynptr helper functions
such as bpf_free.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/linux/bpf.h          |  4 +++-
 include/linux/bpf_verifier.h |  3 +--
 kernel/bpf/btf.c             |  3 ++-
 kernel/bpf/ringbuf.c         |  4 ++--
 kernel/bpf/verifier.c        | 42 ++++++++++++++++++------------------
 net/core/filter.c            |  2 +-
 6 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 6f2558da9d4a..cb9f42866cde 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -344,7 +344,9 @@ enum bpf_type_flag {
 
 	MEM_UNINIT		= BIT(5 + BPF_BASE_TYPE_BITS),
 
-	__BPF_TYPE_LAST_FLAG	= MEM_UNINIT,
+	MEM_RELEASE		= BIT(6 + BPF_BASE_TYPE_BITS),
+
+	__BPF_TYPE_LAST_FLAG	= MEM_RELEASE,
 };
 
 /* Max number of base types. */
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index c1fc4af47f69..7a01adc9e13f 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -523,8 +523,7 @@ int check_ptr_off_reg(struct bpf_verifier_env *env,
 		      const struct bpf_reg_state *reg, int regno);
 int check_func_arg_reg_off(struct bpf_verifier_env *env,
 			   const struct bpf_reg_state *reg, int regno,
-			   enum bpf_arg_type arg_type,
-			   bool is_release_func);
+			   enum bpf_arg_type arg_type, bool arg_release);
 int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
 			     u32 regno);
 int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 0918a39279f6..e5b765a84aec 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -5830,7 +5830,8 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 		ref_t = btf_type_skip_modifiers(btf, t->type, &ref_id);
 		ref_tname = btf_name_by_offset(btf, ref_t->name_off);
 
-		ret = check_func_arg_reg_off(env, reg, regno, ARG_DONTCARE, rel);
+		ret = check_func_arg_reg_off(env, reg, regno, ARG_DONTCARE,
+					     rel && reg->ref_obj_id);
 		if (ret < 0)
 			return ret;
 
diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index 710ba9de12ce..a723aa484ce4 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -404,7 +404,7 @@ BPF_CALL_2(bpf_ringbuf_submit, void *, sample, u64, flags)
 const struct bpf_func_proto bpf_ringbuf_submit_proto = {
 	.func		= bpf_ringbuf_submit,
 	.ret_type	= RET_VOID,
-	.arg1_type	= ARG_PTR_TO_ALLOC_MEM,
+	.arg1_type	= ARG_PTR_TO_ALLOC_MEM | MEM_RELEASE,
 	.arg2_type	= ARG_ANYTHING,
 };
 
@@ -417,7 +417,7 @@ BPF_CALL_2(bpf_ringbuf_discard, void *, sample, u64, flags)
 const struct bpf_func_proto bpf_ringbuf_discard_proto = {
 	.func		= bpf_ringbuf_discard,
 	.ret_type	= RET_VOID,
-	.arg1_type	= ARG_PTR_TO_ALLOC_MEM,
+	.arg1_type	= ARG_PTR_TO_ALLOC_MEM | MEM_RELEASE,
 	.arg2_type	= ARG_ANYTHING,
 };
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 90280d5666be..80e53303713e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -471,15 +471,12 @@ static bool type_may_be_null(u32 type)
 	return type & PTR_MAYBE_NULL;
 }
 
-/* Determine whether the function releases some resources allocated by another
- * function call. The first reference type argument will be assumed to be
- * released by release_reference().
+/* Determine whether the type releases some resources allocated by a
+ * previous function call.
  */
-static bool is_release_function(enum bpf_func_id func_id)
+static bool type_is_release_mem(u32 type)
 {
-	return func_id == BPF_FUNC_sk_release ||
-	       func_id == BPF_FUNC_ringbuf_submit ||
-	       func_id == BPF_FUNC_ringbuf_discard;
+	return type & MEM_RELEASE;
 }
 
 static bool may_be_acquire_function(enum bpf_func_id func_id)
@@ -5364,13 +5361,11 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 
 int check_func_arg_reg_off(struct bpf_verifier_env *env,
 			   const struct bpf_reg_state *reg, int regno,
-			   enum bpf_arg_type arg_type,
-			   bool is_release_func)
+			   enum bpf_arg_type arg_type, bool arg_release)
 {
-	bool fixed_off_ok = false, release_reg;
-	enum bpf_reg_type type = reg->type;
+	bool fixed_off_ok = false;
 
-	switch ((u32)type) {
+	switch ((u32)reg->type) {
 	case SCALAR_VALUE:
 	/* Pointer types where reg offset is explicitly allowed: */
 	case PTR_TO_PACKET:
@@ -5393,18 +5388,15 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
 	 * fixed offset.
 	 */
 	case PTR_TO_BTF_ID:
-		/* When referenced PTR_TO_BTF_ID is passed to release function,
-		 * it's fixed offset must be 0. We rely on the property that
-		 * only one referenced register can be passed to BPF helpers and
-		 * kfuncs. In the other cases, fixed offset can be non-zero.
+		/* If a referenced PTR_TO_BTF_ID will be released, its fixed offset
+		 * must be 0.
 		 */
-		release_reg = is_release_func && reg->ref_obj_id;
-		if (release_reg && reg->off) {
+		if (arg_release && reg->off) {
 			verbose(env, "R%d must have zero offset when passed to release func\n",
 				regno);
 			return -EINVAL;
 		}
-		/* For release_reg == true, fixed_off_ok must be false, but we
+		/* For arg_release == true, fixed_off_ok must be false, but we
 		 * already checked and rejected reg->off != 0 above, so set to
 		 * true to allow fixed offset for all other cases.
 		 */
@@ -5424,6 +5416,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
 	enum bpf_arg_type arg_type = fn->arg_type[arg];
 	enum bpf_reg_type type = reg->type;
+	bool arg_release;
 	int err = 0;
 
 	if (arg_type == ARG_DONTCARE)
@@ -5464,7 +5457,14 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 	if (err)
 		return err;
 
-	err = check_func_arg_reg_off(env, reg, regno, arg_type, is_release_function(meta->func_id));
+	arg_release = type_is_release_mem(arg_type);
+	if (arg_release && !reg->ref_obj_id) {
+		verbose(env, "R%d arg #%d is an unacquired reference and hence cannot be released\n",
+			regno, arg + 1);
+		return -EINVAL;
+	}
+
+	err = check_func_arg_reg_off(env, reg, regno, arg_type, arg_release);
 	if (err)
 		return err;
 
@@ -6693,7 +6693,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			return err;
 	}
 
-	if (is_release_function(func_id)) {
+	if (meta.ref_obj_id) {
 		err = release_reference(env, meta.ref_obj_id);
 		if (err) {
 			verbose(env, "func %s#%d reference has not been acquired before\n",
diff --git a/net/core/filter.c b/net/core/filter.c
index 9aafec3a09ed..a935ce7a63bc 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -6621,7 +6621,7 @@ static const struct bpf_func_proto bpf_sk_release_proto = {
 	.func		= bpf_sk_release,
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_BTF_ID_SOCK_COMMON,
+	.arg1_type	= ARG_PTR_TO_BTF_ID_SOCK_COMMON | MEM_RELEASE,
 };
 
 BPF_CALL_5(bpf_xdp_sk_lookup_udp, struct xdp_buff *, ctx,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next v1 3/7] bpf: Add bpf_dynptr_from_mem, bpf_malloc, bpf_free
  2022-04-02  1:58 [PATCH bpf-next v1 0/7] Dynamic pointers Joanne Koong
  2022-04-02  1:58 ` [PATCH bpf-next v1 1/7] bpf: Add MEM_UNINIT as a bpf_type_flag Joanne Koong
  2022-04-02  1:58 ` [PATCH bpf-next v1 2/7] bpf: Add MEM_RELEASE " Joanne Koong
@ 2022-04-02  1:58 ` Joanne Koong
  2022-04-06 22:23   ` Andrii Nakryiko
  2022-04-02  1:58 ` [PATCH bpf-next v1 4/7] bpf: Add bpf_dynptr_read and bpf_dynptr_write Joanne Koong
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Joanne Koong @ 2022-04-02  1:58 UTC (permalink / raw)
  To: bpf; +Cc: andrii, ast, daniel, Joanne Koong

From: Joanne Koong <joannelkoong@gmail.com>

This patch adds 3 new APIs and the bulk of the verifier work for
supporting dynamic pointers in bpf.

There are different types of dynptrs. This patch starts with the most
basic ones, ones that reference a program's local memory
(eg a stack variable) and ones that reference memory that is dynamically
allocated on behalf of the program. If the memory is dynamically
allocated by the program, the program *must* free it before the program
exits. This is enforced by the verifier.

The added APIs are:

long bpf_dynptr_from_mem(void *data, u32 size, struct bpf_dynptr *ptr);
long bpf_malloc(u32 size, struct bpf_dynptr *ptr);
void bpf_free(struct bpf_dynptr *ptr);

This patch sets up the verifier to support dynptrs. Dynptrs will always
reside on the program's stack frame. As such, their state is tracked
in their corresponding stack slot, which includes the type of dynptr
(DYNPTR_LOCAL vs. DYNPTR_MALLOC).

When the program passes in an uninitialized dynptr (ARG_PTR_TO_DYNPTR |
MEM_UNINIT), the stack slots corresponding to the frame pointer
where the dynptr resides at is marked as STACK_DYNPTR. For helper functions
that take in iniitalized dynptrs (such as the next patch in this series
which supports dynptr reads/writes), the verifier enforces that the
dynptr has been initialized by checking that their corresponding stack
slots have been marked as STACK_DYNPTR. Dynptr release functions
(eg bpf_free) will clear the stack slots. The verifier enforces at program
exit that there are no dynptr stack slots that need to be released.

There are other constraints that are enforced by the verifier as
well, such as that the dynptr cannot be written to directly by the bpf
program or by non-dynptr helper functions. The last patch in this series
contains tests that trigger different cases that the verifier needs to
successfully reject.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/linux/bpf.h            |  74 ++++++++-
 include/linux/bpf_verifier.h   |  18 +++
 include/uapi/linux/bpf.h       |  40 +++++
 kernel/bpf/helpers.c           |  88 +++++++++++
 kernel/bpf/verifier.c          | 266 ++++++++++++++++++++++++++++++++-
 scripts/bpf_doc.py             |   2 +
 tools/include/uapi/linux/bpf.h |  40 +++++
 7 files changed, 521 insertions(+), 7 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index cb9f42866cde..e0fcff9f2aee 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -346,7 +346,13 @@ enum bpf_type_flag {
 
 	MEM_RELEASE		= BIT(6 + BPF_BASE_TYPE_BITS),
 
-	__BPF_TYPE_LAST_FLAG	= MEM_RELEASE,
+	/* DYNPTR points to a program's local memory (eg stack variable). */
+	DYNPTR_TYPE_LOCAL	= BIT(7 + BPF_BASE_TYPE_BITS),
+
+	/* DYNPTR points to dynamically allocated memory. */
+	DYNPTR_TYPE_MALLOC	= BIT(8 + BPF_BASE_TYPE_BITS),
+
+	__BPF_TYPE_LAST_FLAG	= DYNPTR_TYPE_MALLOC,
 };
 
 /* Max number of base types. */
@@ -390,6 +396,7 @@ enum bpf_arg_type {
 	ARG_PTR_TO_STACK,	/* pointer to stack */
 	ARG_PTR_TO_CONST_STR,	/* pointer to a null terminated read-only string */
 	ARG_PTR_TO_TIMER,	/* pointer to bpf_timer */
+	ARG_PTR_TO_DYNPTR,      /* pointer to bpf_dynptr. See bpf_type_flag for dynptr type */
 	__BPF_ARG_TYPE_MAX,
 
 	/* Extended arg_types. */
@@ -2396,4 +2403,69 @@ int bpf_bprintf_prepare(char *fmt, u32 fmt_size, const u64 *raw_args,
 			u32 **bin_buf, u32 num_args);
 void bpf_bprintf_cleanup(void);
 
+/* the implementation of the opaque uapi struct bpf_dynptr */
+struct bpf_dynptr_kern {
+	u8 *data;
+	/* The upper 4 bits are reserved. Bit 29 denotes whether the
+	 * dynptr is read-only. Bits 30 - 32 denote the dynptr type.
+	 */
+	u32 size;
+	u32 offset;
+} __aligned(8);
+
+enum bpf_dynptr_type {
+	/* Local memory used by the bpf program (eg stack variable) */
+	BPF_DYNPTR_TYPE_LOCAL,
+	/* Memory allocated dynamically by the kernel for the dynptr */
+	BPF_DYNPTR_TYPE_MALLOC,
+};
+
+/* The upper 4 bits of dynptr->size are reserved. Consequently, the
+ * maximum supported size is 2^28 - 1.
+ */
+#define DYNPTR_MAX_SIZE	((1UL << 28) - 1)
+#define DYNPTR_SIZE_MASK	0xFFFFFFF
+#define DYNPTR_TYPE_SHIFT	29
+
+static inline enum bpf_dynptr_type bpf_dynptr_get_type(struct bpf_dynptr_kern *ptr)
+{
+	return ptr->size >> DYNPTR_TYPE_SHIFT;
+}
+
+static inline void bpf_dynptr_set_type(struct bpf_dynptr_kern *ptr, enum bpf_dynptr_type type)
+{
+	ptr->size |= type << DYNPTR_TYPE_SHIFT;
+}
+
+static inline u32 bpf_dynptr_get_size(struct bpf_dynptr_kern *ptr)
+{
+	return ptr->size & DYNPTR_SIZE_MASK;
+}
+
+static inline int bpf_dynptr_check_size(u32 size)
+{
+	if (size == 0)
+		return -EINVAL;
+
+	if (size > DYNPTR_MAX_SIZE)
+		return -E2BIG;
+
+	return 0;
+}
+
+static inline int bpf_dynptr_check_off_len(struct bpf_dynptr_kern *ptr, u32 offset, u32 len)
+{
+	u32 capacity = bpf_dynptr_get_size(ptr) - ptr->offset;
+
+	if (len > capacity || offset > capacity - len)
+		return -EINVAL;
+
+	return 0;
+}
+
+void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data, enum bpf_dynptr_type type,
+		     u32 offset, u32 size);
+
+void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
+
 #endif /* _LINUX_BPF_H */
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 7a01adc9e13f..bc0f105148f9 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -72,6 +72,18 @@ struct bpf_reg_state {
 
 		u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */
 
+		/* for dynptr stack slots */
+		struct {
+			enum bpf_dynptr_type dynptr_type;
+			/* A dynptr is 16 bytes so it takes up 2 stack slots.
+			 * We need to track which slot is the first slot
+			 * to protect against cases where the user may try to
+			 * pass in an address starting at the second slot of the
+			 * dynptr.
+			 */
+			bool dynptr_first_slot;
+		};
+
 		/* Max size from any of the above. */
 		struct {
 			unsigned long raw1;
@@ -174,9 +186,15 @@ enum bpf_stack_slot_type {
 	STACK_SPILL,      /* register spilled into stack */
 	STACK_MISC,	  /* BPF program wrote some data into this slot */
 	STACK_ZERO,	  /* BPF program wrote constant zero */
+	/* A dynptr is stored in this stack slot. The type of dynptr
+	 * is stored in bpf_stack_state->spilled_ptr.type
+	 */
+	STACK_DYNPTR,
 };
 
 #define BPF_REG_SIZE 8	/* size of eBPF register in bytes */
+#define BPF_DYNPTR_SIZE 16 /* size of a struct bpf_dynptr in bytes */
+#define BPF_DYNPTR_NR_SLOTS 2
 
 struct bpf_stack_state {
 	struct bpf_reg_state spilled_ptr;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d14b10b85e51..6a57d8a1b882 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5143,6 +5143,38 @@ union bpf_attr {
  *		The **hash_algo** is returned on success,
  *		**-EOPNOTSUP** if the hash calculation failed or **-EINVAL** if
  *		invalid arguments are passed.
+ *
+ * long bpf_dynptr_from_mem(void *data, u32 size, struct bpf_dynptr *ptr)
+ *	Description
+ *		Get a dynptr to local memory *data*.
+ *
+ *		For a dynptr to a dynamic memory allocation, please use bpf_malloc
+ *		instead.
+ *
+ *		The maximum *size* supported is DYNPTR_MAX_SIZE.
+ *	Return
+ *		0 on success or -EINVAL if the size is 0 or exceeds DYNPTR_MAX_SIZE.
+ *
+ * long bpf_malloc(u32 size, struct bpf_dynptr *ptr)
+ *	Description
+ *		Dynamically allocate memory of *size* bytes.
+ *
+ *		Every call to bpf_malloc must have a corresponding
+ *		bpf_free, regardless of whether the bpf_malloc
+ *		succeeded.
+ *
+ *		The maximum *size* supported is DYNPTR_MAX_SIZE.
+ *	Return
+ *		0 on success, -ENOMEM if there is not enough memory for the
+ *		allocation, -EINVAL if the size is 0 or exceeds DYNPTR_MAX_SIZE.
+ *
+ * void bpf_free(struct bpf_dynptr *ptr)
+ *	Description
+ *		Free memory allocated by bpf_malloc.
+ *
+ *		After this operation, *ptr* will be an invalidated dynptr.
+ *	Return
+ *		Void.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5339,6 +5371,9 @@ union bpf_attr {
 	FN(copy_from_user_task),	\
 	FN(skb_set_tstamp),		\
 	FN(ima_file_hash),		\
+	FN(dynptr_from_mem),		\
+	FN(malloc),			\
+	FN(free),			\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
@@ -6486,6 +6521,11 @@ struct bpf_timer {
 	__u64 :64;
 } __attribute__((aligned(8)));
 
+struct bpf_dynptr {
+	__u64 :64;
+	__u64 :64;
+} __attribute__((aligned(8)));
+
 struct bpf_sysctl {
 	__u32	write;		/* Sysctl is being read (= 0) or written (= 1).
 				 * Allows 1,2,4-byte read, but no write.
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index cc6d480c5c23..ed5a7d9d0a18 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1374,6 +1374,88 @@ void bpf_timer_cancel_and_free(void *val)
 	kfree(t);
 }
 
+void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data, enum bpf_dynptr_type type,
+		     u32 offset, u32 size)
+{
+	ptr->data = data;
+	ptr->offset = offset;
+	ptr->size = size;
+	bpf_dynptr_set_type(ptr, type);
+}
+
+void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr)
+{
+	memset(ptr, 0, sizeof(*ptr));
+}
+
+BPF_CALL_3(bpf_dynptr_from_mem, void *, data, u32, size, struct bpf_dynptr_kern *, ptr)
+{
+	int err;
+
+	err = bpf_dynptr_check_size(size);
+	if (err) {
+		bpf_dynptr_set_null(ptr);
+		return err;
+	}
+
+	bpf_dynptr_init(ptr, data, BPF_DYNPTR_TYPE_LOCAL, 0, size);
+
+	return 0;
+}
+
+const struct bpf_func_proto bpf_dynptr_from_mem_proto = {
+	.func		= bpf_dynptr_from_mem,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_MEM,
+	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
+	.arg3_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL | MEM_UNINIT,
+};
+
+BPF_CALL_2(bpf_malloc, u32, size, struct bpf_dynptr_kern *, ptr)
+{
+	void *data;
+	int err;
+
+	err = bpf_dynptr_check_size(size);
+	if (err) {
+		bpf_dynptr_set_null(ptr);
+		return err;
+	}
+
+	data = kmalloc(size, GFP_ATOMIC);
+	if (!data) {
+		bpf_dynptr_set_null(ptr);
+		return -ENOMEM;
+	}
+
+	bpf_dynptr_init(ptr, data, BPF_DYNPTR_TYPE_MALLOC, 0, size);
+
+	return 0;
+}
+
+const struct bpf_func_proto bpf_malloc_proto = {
+	.func		= bpf_malloc,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_ANYTHING,
+	.arg2_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_MALLOC | MEM_UNINIT,
+};
+
+BPF_CALL_1(bpf_free, struct bpf_dynptr_kern *, dynptr)
+{
+	kfree(dynptr->data);
+	bpf_dynptr_set_null(dynptr);
+	return 0;
+}
+
+const struct bpf_func_proto bpf_free_proto = {
+	.func		= bpf_free,
+	.gpl_only	= false,
+	.ret_type	= RET_VOID,
+	.arg1_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_MALLOC | MEM_RELEASE,
+};
+
 const struct bpf_func_proto bpf_get_current_task_proto __weak;
 const struct bpf_func_proto bpf_get_current_task_btf_proto __weak;
 const struct bpf_func_proto bpf_probe_read_user_proto __weak;
@@ -1426,6 +1508,12 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		return &bpf_loop_proto;
 	case BPF_FUNC_strncmp:
 		return &bpf_strncmp_proto;
+	case BPF_FUNC_dynptr_from_mem:
+		return &bpf_dynptr_from_mem_proto;
+	case BPF_FUNC_malloc:
+		return &bpf_malloc_proto;
+	case BPF_FUNC_free:
+		return &bpf_free_proto;
 	default:
 		break;
 	}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 80e53303713e..cb3bcb54d4b4 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -479,6 +479,11 @@ static bool type_is_release_mem(u32 type)
 	return type & MEM_RELEASE;
 }
 
+static bool type_is_uninit_mem(u32 type)
+{
+	return type & MEM_UNINIT;
+}
+
 static bool may_be_acquire_function(enum bpf_func_id func_id)
 {
 	return func_id == BPF_FUNC_sk_lookup_tcp ||
@@ -583,6 +588,7 @@ static char slot_type_char[] = {
 	[STACK_SPILL]	= 'r',
 	[STACK_MISC]	= 'm',
 	[STACK_ZERO]	= '0',
+	[STACK_DYNPTR]	= 'd',
 };
 
 static void print_liveness(struct bpf_verifier_env *env,
@@ -598,6 +604,18 @@ static void print_liveness(struct bpf_verifier_env *env,
 		verbose(env, "D");
 }
 
+static inline int get_spi(s32 off)
+{
+	return (-off - 1) / BPF_REG_SIZE;
+}
+
+static bool check_spi_bounds(struct bpf_func_state *state, int spi, u32 nr_slots)
+{
+	int allocated_slots = state->allocated_stack / BPF_REG_SIZE;
+
+	return allocated_slots > spi && nr_slots - 1 <= spi;
+}
+
 static struct bpf_func_state *func(struct bpf_verifier_env *env,
 				   const struct bpf_reg_state *reg)
 {
@@ -649,6 +667,133 @@ static void mark_verifier_state_scratched(struct bpf_verifier_env *env)
 	env->scratched_stack_slots = ~0ULL;
 }
 
+static int arg_to_dynptr_type(enum bpf_arg_type arg_type, enum bpf_dynptr_type *dynptr_type)
+{
+	int type = arg_type & (DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_MALLOC);
+
+	switch (type) {
+	case DYNPTR_TYPE_LOCAL:
+		*dynptr_type = BPF_DYNPTR_TYPE_LOCAL;
+		break;
+	case DYNPTR_TYPE_MALLOC:
+		*dynptr_type = BPF_DYNPTR_TYPE_MALLOC;
+		break;
+	default:
+		/* Can't have more than one type set and can't have no
+		 * type set
+		 */
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static bool dynptr_type_refcounted(struct bpf_func_state *state, int spi)
+{
+	enum bpf_dynptr_type type = state->stack[spi].spilled_ptr.dynptr_type;
+
+	return type == BPF_DYNPTR_TYPE_MALLOC;
+}
+
+static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
+				   enum bpf_arg_type arg_type)
+{
+	struct bpf_func_state *state = cur_func(env);
+	enum bpf_dynptr_type type;
+	int spi, i, err;
+
+	spi = get_spi(reg->off);
+
+	if (!check_spi_bounds(state, spi, BPF_DYNPTR_NR_SLOTS))
+		return -EINVAL;
+
+	err = arg_to_dynptr_type(arg_type, &type);
+	if (unlikely(err))
+		return err;
+
+	for (i = 0; i < BPF_REG_SIZE; i++) {
+		state->stack[spi].slot_type[i] = STACK_DYNPTR;
+		state->stack[spi - 1].slot_type[i] = STACK_DYNPTR;
+	}
+
+	state->stack[spi].spilled_ptr.dynptr_type = type;
+	state->stack[spi - 1].spilled_ptr.dynptr_type = type;
+
+	state->stack[spi].spilled_ptr.dynptr_first_slot = true;
+
+	return 0;
+}
+
+static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+{
+	struct bpf_func_state *state = func(env, reg);
+	int spi, i;
+
+	spi = get_spi(reg->off);
+
+	if (!check_spi_bounds(state, spi, BPF_DYNPTR_NR_SLOTS))
+		return -EINVAL;
+
+	for (i = 0; i < BPF_REG_SIZE; i++) {
+		state->stack[spi].slot_type[i] = STACK_INVALID;
+		state->stack[spi - 1].slot_type[i] = STACK_INVALID;
+	}
+
+	state->stack[spi].spilled_ptr.dynptr_type = 0;
+	state->stack[spi].spilled_ptr.dynptr_first_slot = 0;
+	state->stack[spi - 1].spilled_ptr.dynptr_type = 0;
+
+	return 0;
+}
+
+/* Check if the dynptr argument is a proper initialized dynptr */
+static bool check_dynptr_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
+			      enum bpf_arg_type arg_type)
+{
+	struct bpf_func_state *state = func(env, reg);
+	enum bpf_dynptr_type expected_type;
+	int spi, err;
+
+	/* Can't pass in a dynptr at a weird offset */
+	if (reg->off % BPF_REG_SIZE)
+		return false;
+
+	spi = get_spi(reg->off);
+
+	if (!check_spi_bounds(state, spi, BPF_DYNPTR_NR_SLOTS))
+		return false;
+
+	if (!state->stack[spi].spilled_ptr.dynptr_first_slot)
+		return false;
+
+	if (state->stack[spi].slot_type[0] != STACK_DYNPTR)
+		return false;
+
+	/* ARG_PTR_TO_DYNPTR takes any type of dynptr */
+	if (arg_type == ARG_PTR_TO_DYNPTR)
+		return true;
+
+	err = arg_to_dynptr_type(arg_type, &expected_type);
+	if (unlikely(err))
+		return err;
+
+	return state->stack[spi].spilled_ptr.dynptr_type == expected_type;
+}
+
+static bool stack_access_into_dynptr(struct bpf_func_state *state, int spi, int size)
+{
+	int nr_slots, i;
+
+	nr_slots = min(roundup(size, BPF_REG_SIZE) / BPF_REG_SIZE, spi + 1);
+
+	for (i = 0; i < nr_slots; i++) {
+		if (state->stack[spi - i].slot_type[0] == STACK_DYNPTR)
+			return true;
+	}
+
+	return false;
+}
+
 /* The reg state of a pointer or a bounded scalar was saved when
  * it was spilled to the stack.
  */
@@ -2885,6 +3030,12 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 	}
 
 	mark_stack_slot_scratched(env, spi);
+
+	if (stack_access_into_dynptr(state, spi, size)) {
+		verbose(env, "direct write into dynptr is not permitted\n");
+		return -EINVAL;
+	}
+
 	if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) &&
 	    !register_is_null(reg) && env->bpf_capable) {
 		if (dst_reg != BPF_REG_FP) {
@@ -3006,6 +3157,12 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env,
 		slot = -i - 1;
 		spi = slot / BPF_REG_SIZE;
 		stype = &state->stack[spi].slot_type[slot % BPF_REG_SIZE];
+
+		if (*stype == STACK_DYNPTR) {
+			verbose(env, "direct write into dynptr is not permitted\n");
+			return -EINVAL;
+		}
+
 		mark_stack_slot_scratched(env, spi);
 
 		if (!env->allow_ptr_leaks
@@ -5153,6 +5310,16 @@ static bool arg_type_is_int_ptr(enum bpf_arg_type type)
 	       type == ARG_PTR_TO_LONG;
 }
 
+static inline bool arg_type_is_dynptr(enum bpf_arg_type type)
+{
+	return base_type(type) == ARG_PTR_TO_DYNPTR;
+}
+
+static inline bool arg_type_is_dynptr_uninit(enum bpf_arg_type type)
+{
+	return arg_type_is_dynptr(type) && type & MEM_UNINIT;
+}
+
 static int int_ptr_type_to_size(enum bpf_arg_type type)
 {
 	if (type == ARG_PTR_TO_INT)
@@ -5290,6 +5457,7 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
 	[ARG_PTR_TO_STACK]		= &stack_ptr_types,
 	[ARG_PTR_TO_CONST_STR]		= &const_str_ptr_types,
 	[ARG_PTR_TO_TIMER]		= &timer_types,
+	[ARG_PTR_TO_DYNPTR]		= &stack_ptr_types,
 };
 
 static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
@@ -5408,6 +5576,15 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
 	return __check_ptr_off_reg(env, reg, regno, fixed_off_ok);
 }
 
+/*
+ * Determines whether the id used for reference tracking is held in a stack slot
+ * or in a register
+ */
+static bool id_in_stack_slot(enum bpf_arg_type arg_type)
+{
+	return arg_type_is_dynptr(arg_type);
+}
+
 static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			  struct bpf_call_arg_meta *meta,
 			  const struct bpf_func_proto *fn)
@@ -5458,10 +5635,19 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		return err;
 
 	arg_release = type_is_release_mem(arg_type);
-	if (arg_release && !reg->ref_obj_id) {
-		verbose(env, "R%d arg #%d is an unacquired reference and hence cannot be released\n",
-			regno, arg + 1);
-		return -EINVAL;
+	if (arg_release) {
+		if (id_in_stack_slot(arg_type)) {
+			struct bpf_func_state *state = func(env, reg);
+			int spi = get_spi(reg->off);
+
+			if (!state->stack[spi].spilled_ptr.id)
+				goto unacquired_ref_err;
+		} else if (!reg->ref_obj_id)  {
+unacquired_ref_err:
+			verbose(env, "R%d arg #%d is an unacquired reference and hence cannot be released\n",
+				regno, arg + 1);
+			return -EINVAL;
+		}
 	}
 
 	err = check_func_arg_reg_off(env, reg, regno, arg_type, arg_release);
@@ -5572,6 +5758,40 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
 
 		err = check_mem_size_reg(env, reg, regno, zero_size_allowed, meta);
+	} else if (arg_type_is_dynptr(arg_type)) {
+		bool initialized = check_dynptr_init(env, reg, arg_type);
+
+		if (type_is_uninit_mem(arg_type)) {
+			if (initialized) {
+				verbose(env, "Arg #%d dynptr cannot be an initialized dynptr\n",
+					arg + 1);
+				return -EINVAL;
+			}
+			meta->raw_mode = true;
+			err = check_helper_mem_access(env, regno, BPF_DYNPTR_SIZE, false, meta);
+			/* For now, we do not allow dynptrs to point to existing
+			 * refcounted memory
+			 */
+			if (reg_type_may_be_refcounted_or_null(regs[BPF_REG_1].type)) {
+				verbose(env, "Arg #%d dynptr memory cannot be potentially refcounted\n",
+					arg + 1);
+				return -EINVAL;
+			}
+		} else {
+			if (!initialized) {
+				char *err_extra = "";
+
+				if (arg_type & DYNPTR_TYPE_LOCAL)
+					err_extra = "local ";
+				else if (arg_type & DYNPTR_TYPE_MALLOC)
+					err_extra = "malloc ";
+				verbose(env, "Expected an initialized %sdynptr as arg #%d\n",
+					err_extra, arg + 1);
+				return -EINVAL;
+			}
+			if (type_is_release_mem(arg_type))
+				err = unmark_stack_slots_dynptr(env, reg);
+		}
 	} else if (arg_type_is_alloc_size(arg_type)) {
 		if (!tnum_is_const(reg->var_off)) {
 			verbose(env, "R%d is not a known constant'\n",
@@ -6552,6 +6772,25 @@ static int check_reference_leak(struct bpf_verifier_env *env)
 	return state->acquired_refs ? -EINVAL : 0;
 }
 
+/* Called at BPF_EXIT to detect if there are any reference-tracked dynptrs that have
+ * not been released. Dynptrs to local memory do not need to be released.
+ */
+static int check_dynptr_unreleased(struct bpf_verifier_env *env)
+{
+	struct bpf_func_state *state = cur_func(env);
+	int i;
+
+	for (i = 0; i < state->allocated_stack / BPF_REG_SIZE; i++) {
+		if (state->stack[i].slot_type[0] == STACK_DYNPTR &&
+		    dynptr_type_refcounted(state, i)) {
+			verbose(env, "spi=%d is an unreleased dynptr\n", i);
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
 static int check_bpf_snprintf_call(struct bpf_verifier_env *env,
 				   struct bpf_reg_state *regs)
 {
@@ -6693,6 +6932,14 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			return err;
 	}
 
+	regs = cur_regs(env);
+
+	for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
+		if (arg_type_is_dynptr_uninit(fn->arg_type[i]))
+			err = mark_stack_slots_dynptr(env, &regs[BPF_REG_1 + i],
+						      fn->arg_type[i]);
+	}
+
 	if (meta.ref_obj_id) {
 		err = release_reference(env, meta.ref_obj_id);
 		if (err) {
@@ -6702,8 +6949,6 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		}
 	}
 
-	regs = cur_regs(env);
-
 	switch (func_id) {
 	case BPF_FUNC_tail_call:
 		err = check_reference_leak(env);
@@ -6711,6 +6956,11 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			verbose(env, "tail_call would lead to reference leak\n");
 			return err;
 		}
+		err = check_dynptr_unreleased(env);
+		if (err) {
+			verbose(env, "tail_call would lead to dynptr memory leak\n");
+			return err;
+		}
 		break;
 	case BPF_FUNC_get_local_storage:
 		/* check that flags argument in get_local_storage(map, flags) is 0,
@@ -11703,6 +11953,10 @@ static int do_check(struct bpf_verifier_env *env)
 					return -EINVAL;
 				}
 
+				err = check_dynptr_unreleased(env);
+				if (err)
+					return err;
+
 				if (state->curframe) {
 					/* exit from nested function */
 					err = prepare_func_exit(env, &env->insn_idx);
diff --git a/scripts/bpf_doc.py b/scripts/bpf_doc.py
index 096625242475..766dcbc73897 100755
--- a/scripts/bpf_doc.py
+++ b/scripts/bpf_doc.py
@@ -633,6 +633,7 @@ class PrinterHelpers(Printer):
             'struct socket',
             'struct file',
             'struct bpf_timer',
+            'struct bpf_dynptr',
     ]
     known_types = {
             '...',
@@ -682,6 +683,7 @@ class PrinterHelpers(Printer):
             'struct socket',
             'struct file',
             'struct bpf_timer',
+            'struct bpf_dynptr',
     }
     mapped_types = {
             'u8': '__u8',
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index d14b10b85e51..6a57d8a1b882 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5143,6 +5143,38 @@ union bpf_attr {
  *		The **hash_algo** is returned on success,
  *		**-EOPNOTSUP** if the hash calculation failed or **-EINVAL** if
  *		invalid arguments are passed.
+ *
+ * long bpf_dynptr_from_mem(void *data, u32 size, struct bpf_dynptr *ptr)
+ *	Description
+ *		Get a dynptr to local memory *data*.
+ *
+ *		For a dynptr to a dynamic memory allocation, please use bpf_malloc
+ *		instead.
+ *
+ *		The maximum *size* supported is DYNPTR_MAX_SIZE.
+ *	Return
+ *		0 on success or -EINVAL if the size is 0 or exceeds DYNPTR_MAX_SIZE.
+ *
+ * long bpf_malloc(u32 size, struct bpf_dynptr *ptr)
+ *	Description
+ *		Dynamically allocate memory of *size* bytes.
+ *
+ *		Every call to bpf_malloc must have a corresponding
+ *		bpf_free, regardless of whether the bpf_malloc
+ *		succeeded.
+ *
+ *		The maximum *size* supported is DYNPTR_MAX_SIZE.
+ *	Return
+ *		0 on success, -ENOMEM if there is not enough memory for the
+ *		allocation, -EINVAL if the size is 0 or exceeds DYNPTR_MAX_SIZE.
+ *
+ * void bpf_free(struct bpf_dynptr *ptr)
+ *	Description
+ *		Free memory allocated by bpf_malloc.
+ *
+ *		After this operation, *ptr* will be an invalidated dynptr.
+ *	Return
+ *		Void.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5339,6 +5371,9 @@ union bpf_attr {
 	FN(copy_from_user_task),	\
 	FN(skb_set_tstamp),		\
 	FN(ima_file_hash),		\
+	FN(dynptr_from_mem),		\
+	FN(malloc),			\
+	FN(free),			\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
@@ -6486,6 +6521,11 @@ struct bpf_timer {
 	__u64 :64;
 } __attribute__((aligned(8)));
 
+struct bpf_dynptr {
+	__u64 :64;
+	__u64 :64;
+} __attribute__((aligned(8)));
+
 struct bpf_sysctl {
 	__u32	write;		/* Sysctl is being read (= 0) or written (= 1).
 				 * Allows 1,2,4-byte read, but no write.
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next v1 4/7] bpf: Add bpf_dynptr_read and bpf_dynptr_write
  2022-04-02  1:58 [PATCH bpf-next v1 0/7] Dynamic pointers Joanne Koong
                   ` (2 preceding siblings ...)
  2022-04-02  1:58 ` [PATCH bpf-next v1 3/7] bpf: Add bpf_dynptr_from_mem, bpf_malloc, bpf_free Joanne Koong
@ 2022-04-02  1:58 ` Joanne Koong
  2022-04-02 13:35   ` Toke Høiland-Jørgensen
  2022-04-06 22:32   ` Andrii Nakryiko
  2022-04-02  1:58 ` [PATCH bpf-next v1 5/7] bpf: Add dynptr data slices Joanne Koong
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 32+ messages in thread
From: Joanne Koong @ 2022-04-02  1:58 UTC (permalink / raw)
  To: bpf; +Cc: andrii, ast, daniel, Joanne Koong

From: Joanne Koong <joannelkoong@gmail.com>

This patch adds two helper functions, bpf_dynptr_read and
bpf_dynptr_write:

long bpf_dynptr_read(void *dst, u32 len, struct bpf_dynptr *src, u32 offset);

long bpf_dynptr_write(struct bpf_dynptr *dst, u32 offset, void *src, u32 len);

The dynptr passed into these functions must be valid dynptrs that have
been initialized.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/linux/bpf.h            |  6 ++++
 include/uapi/linux/bpf.h       | 18 +++++++++++
 kernel/bpf/helpers.c           | 56 ++++++++++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h | 18 +++++++++++
 4 files changed, 98 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e0fcff9f2aee..cded9753fb7f 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2426,6 +2426,12 @@ enum bpf_dynptr_type {
 #define DYNPTR_MAX_SIZE	((1UL << 28) - 1)
 #define DYNPTR_SIZE_MASK	0xFFFFFFF
 #define DYNPTR_TYPE_SHIFT	29
+#define DYNPTR_RDONLY_BIT	BIT(28)
+
+static inline bool bpf_dynptr_is_rdonly(struct bpf_dynptr_kern *ptr)
+{
+	return ptr->size & DYNPTR_RDONLY_BIT;
+}
 
 static inline enum bpf_dynptr_type bpf_dynptr_get_type(struct bpf_dynptr_kern *ptr)
 {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6a57d8a1b882..16a35e46be90 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5175,6 +5175,22 @@ union bpf_attr {
  *		After this operation, *ptr* will be an invalidated dynptr.
  *	Return
  *		Void.
+ *
+ * long bpf_dynptr_read(void *dst, u32 len, struct bpf_dynptr *src, u32 offset)
+ *	Description
+ *		Read *len* bytes from *src* into *dst*, starting from *offset*
+ *		into *dst*.
+ *	Return
+ *		0 on success, -EINVAL if *offset* + *len* exceeds the length
+ *		of *src*'s data or if *src* is an invalid dynptr.
+ *
+ * long bpf_dynptr_write(struct bpf_dynptr *dst, u32 offset, void *src, u32 len)
+ *	Description
+ *		Write *len* bytes from *src* into *dst*, starting from *offset*
+ *		into *dst*.
+ *	Return
+ *		0 on success, -EINVAL if *offset* + *len* exceeds the length
+ *		of *dst*'s data or if *dst* is not writeable.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5374,6 +5390,8 @@ union bpf_attr {
 	FN(dynptr_from_mem),		\
 	FN(malloc),			\
 	FN(free),			\
+	FN(dynptr_read),		\
+	FN(dynptr_write),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index ed5a7d9d0a18..7ec20e79928e 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1412,6 +1412,58 @@ const struct bpf_func_proto bpf_dynptr_from_mem_proto = {
 	.arg3_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL | MEM_UNINIT,
 };
 
+BPF_CALL_4(bpf_dynptr_read, void *, dst, u32, len, struct bpf_dynptr_kern *, src, u32, offset)
+{
+	int err;
+
+	if (!src->data)
+		return -EINVAL;
+
+	err = bpf_dynptr_check_off_len(src, offset, len);
+	if (err)
+		return err;
+
+	memcpy(dst, src->data + src->offset + offset, len);
+
+	return 0;
+}
+
+const struct bpf_func_proto bpf_dynptr_read_proto = {
+	.func		= bpf_dynptr_read,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_MEM_UNINIT,
+	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
+	.arg3_type	= ARG_PTR_TO_DYNPTR,
+	.arg4_type	= ARG_ANYTHING,
+};
+
+BPF_CALL_4(bpf_dynptr_write, struct bpf_dynptr_kern *, dst, u32, offset, void *, src, u32, len)
+{
+	int err;
+
+	if (!dst->data || bpf_dynptr_is_rdonly(dst))
+		return -EINVAL;
+
+	err = bpf_dynptr_check_off_len(dst, offset, len);
+	if (err)
+		return err;
+
+	memcpy(dst->data + dst->offset + offset, src, len);
+
+	return 0;
+}
+
+const struct bpf_func_proto bpf_dynptr_write_proto = {
+	.func		= bpf_dynptr_write,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_DYNPTR,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_PTR_TO_MEM | MEM_RDONLY,
+	.arg4_type	= ARG_CONST_SIZE_OR_ZERO,
+};
+
 BPF_CALL_2(bpf_malloc, u32, size, struct bpf_dynptr_kern *, ptr)
 {
 	void *data;
@@ -1514,6 +1566,10 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		return &bpf_malloc_proto;
 	case BPF_FUNC_free:
 		return &bpf_free_proto;
+	case BPF_FUNC_dynptr_read:
+		return &bpf_dynptr_read_proto;
+	case BPF_FUNC_dynptr_write:
+		return &bpf_dynptr_write_proto;
 	default:
 		break;
 	}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 6a57d8a1b882..16a35e46be90 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5175,6 +5175,22 @@ union bpf_attr {
  *		After this operation, *ptr* will be an invalidated dynptr.
  *	Return
  *		Void.
+ *
+ * long bpf_dynptr_read(void *dst, u32 len, struct bpf_dynptr *src, u32 offset)
+ *	Description
+ *		Read *len* bytes from *src* into *dst*, starting from *offset*
+ *		into *dst*.
+ *	Return
+ *		0 on success, -EINVAL if *offset* + *len* exceeds the length
+ *		of *src*'s data or if *src* is an invalid dynptr.
+ *
+ * long bpf_dynptr_write(struct bpf_dynptr *dst, u32 offset, void *src, u32 len)
+ *	Description
+ *		Write *len* bytes from *src* into *dst*, starting from *offset*
+ *		into *dst*.
+ *	Return
+ *		0 on success, -EINVAL if *offset* + *len* exceeds the length
+ *		of *dst*'s data or if *dst* is not writeable.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5374,6 +5390,8 @@ union bpf_attr {
 	FN(dynptr_from_mem),		\
 	FN(malloc),			\
 	FN(free),			\
+	FN(dynptr_read),		\
+	FN(dynptr_write),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next v1 5/7] bpf: Add dynptr data slices
  2022-04-02  1:58 [PATCH bpf-next v1 0/7] Dynamic pointers Joanne Koong
                   ` (3 preceding siblings ...)
  2022-04-02  1:58 ` [PATCH bpf-next v1 4/7] bpf: Add bpf_dynptr_read and bpf_dynptr_write Joanne Koong
@ 2022-04-02  1:58 ` Joanne Koong
  2022-04-02  1:58 ` [PATCH bpf-next v1 6/7] bpf: Dynptr support for ring buffers Joanne Koong
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 32+ messages in thread
From: Joanne Koong @ 2022-04-02  1:58 UTC (permalink / raw)
  To: bpf; +Cc: andrii, ast, daniel, Joanne Koong

From: Joanne Koong <joannelkoong@gmail.com>

This patch adds a new helper function

void *bpf_dynptr_data(struct bpf_dynptr *ptr, u32 offset, u32 len);

which returns a pointer to the underlying data of a dynptr. *len*
must be a statically known value. The bpf program may access the returned
data slice as a normal buffer (eg can do direct reads and writes), since
the verifier associates the length with the returned pointer, and
enforces that no out of bounds accesses occur.

This requires a few additions to the verifier. For every
referenced-tracked dynptr that is initialized, we associate an id with
it and attach any data slices to that id. When a release function is
called on a dynptr (eg bpf_free), we invalidate all slices that
correspond to that dynptr. This ensures the slice can't be used after
its dynptr has been invalidated.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/linux/bpf_verifier.h   |  2 +
 include/uapi/linux/bpf.h       | 12 ++++++
 kernel/bpf/helpers.c           | 28 ++++++++++++++
 kernel/bpf/verifier.c          | 70 +++++++++++++++++++++++++++++++++-
 tools/include/uapi/linux/bpf.h | 12 ++++++
 5 files changed, 122 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index bc0f105148f9..4862567af5ef 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -100,6 +100,8 @@ struct bpf_reg_state {
 	 * for the purpose of tracking that it's freed.
 	 * For PTR_TO_SOCKET this is used to share which pointers retain the
 	 * same reference to the socket, to determine proper reference freeing.
+	 * For stack slots that are dynptrs, this is used to track references to
+	 * the dynptr to enforce proper reference freeing.
 	 */
 	u32 id;
 	/* PTR_TO_SOCKET and PTR_TO_TCP_SOCK could be a ptr returned
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 16a35e46be90..c835e437cb28 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5191,6 +5191,17 @@ union bpf_attr {
  *	Return
  *		0 on success, -EINVAL if *offset* + *len* exceeds the length
  *		of *dst*'s data or if *dst* is not writeable.
+ *
+ * void *bpf_dynptr_data(struct bpf_dynptr *ptr, u32 offset, u32 len)
+ *	Description
+ *		Get a pointer to the underlying dynptr data.
+ *
+ *		*len* must be a statically known value. The returned data slice
+ *		is invalidated whenever the dynptr is invalidated.
+ *	Return
+ *		Pointer to the underlying dynptr data, NULL if the ptr is
+ *		read-only, if the dynptr is invalid, or if the offset and length
+ *		is out of bounds.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5392,6 +5403,7 @@ union bpf_attr {
 	FN(free),			\
 	FN(dynptr_read),		\
 	FN(dynptr_write),		\
+	FN(dynptr_data),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 7ec20e79928e..c1295fb5d9d4 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1412,6 +1412,32 @@ const struct bpf_func_proto bpf_dynptr_from_mem_proto = {
 	.arg3_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL | MEM_UNINIT,
 };
 
+BPF_CALL_3(bpf_dynptr_data, struct bpf_dynptr_kern *, ptr, u32, offset, u32, len)
+{
+	int err;
+
+	if (!ptr->data)
+		return 0;
+
+	err = bpf_dynptr_check_off_len(ptr, offset, len);
+	if (err)
+		return 0;
+
+	if (bpf_dynptr_is_rdonly(ptr))
+		return 0;
+
+	return (unsigned long)(ptr->data + ptr->offset + offset);
+}
+
+const struct bpf_func_proto bpf_dynptr_data_proto = {
+	.func		= bpf_dynptr_data,
+	.gpl_only	= false,
+	.ret_type	= RET_PTR_TO_ALLOC_MEM_OR_NULL,
+	.arg1_type	= ARG_PTR_TO_DYNPTR,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_CONST_ALLOC_SIZE_OR_ZERO,
+};
+
 BPF_CALL_4(bpf_dynptr_read, void *, dst, u32, len, struct bpf_dynptr_kern *, src, u32, offset)
 {
 	int err;
@@ -1570,6 +1596,8 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		return &bpf_dynptr_read_proto;
 	case BPF_FUNC_dynptr_write:
 		return &bpf_dynptr_write_proto;
+	case BPF_FUNC_dynptr_data:
+		return &bpf_dynptr_data_proto;
 	default:
 		break;
 	}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index cb3bcb54d4b4..7352ffb4f9a5 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -187,6 +187,11 @@ struct bpf_verifier_stack_elem {
 					  POISON_POINTER_DELTA))
 #define BPF_MAP_PTR(X)		((struct bpf_map *)((X) & ~BPF_MAP_PTR_UNPRIV))
 
+/* forward declarations */
+static void release_reg_references(struct bpf_verifier_env *env,
+				   struct bpf_func_state *state,
+				   int ref_obj_id);
+
 static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux)
 {
 	return BPF_MAP_PTR(aux->map_ptr_state) == BPF_MAP_PTR_POISON;
@@ -523,6 +528,11 @@ static bool is_ptr_cast_function(enum bpf_func_id func_id)
 		func_id == BPF_FUNC_skc_to_tcp_request_sock;
 }
 
+static inline bool is_dynptr_ref_function(enum bpf_func_id func_id)
+{
+	return func_id == BPF_FUNC_dynptr_data;
+}
+
 static bool is_cmpxchg_insn(const struct bpf_insn *insn)
 {
 	return BPF_CLASS(insn->code) == BPF_STX &&
@@ -700,7 +710,7 @@ static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_
 {
 	struct bpf_func_state *state = cur_func(env);
 	enum bpf_dynptr_type type;
-	int spi, i, err;
+	int spi, id, i, err;
 
 	spi = get_spi(reg->off);
 
@@ -721,12 +731,27 @@ static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_
 
 	state->stack[spi].spilled_ptr.dynptr_first_slot = true;
 
+	/* Generate an id for the dynptr if the dynptr type can be
+	 * acquired/released.
+	 *
+	 * This is used to associated data slices with dynptrs, so that
+	 * if a dynptr gets invalidated, its data slices will also be
+	 * invalidated.
+	 */
+	if (dynptr_type_refcounted(state, spi)) {
+		id = ++env->id_gen;
+		state->stack[spi].spilled_ptr.id = id;
+		state->stack[spi - 1].spilled_ptr.id = id;
+	}
+
 	return 0;
 }
 
 static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
 {
+	struct bpf_verifier_state *vstate = env->cur_state;
 	struct bpf_func_state *state = func(env, reg);
+	bool refcounted;
 	int spi, i;
 
 	spi = get_spi(reg->off);
@@ -734,6 +759,8 @@ static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_re
 	if (!check_spi_bounds(state, spi, BPF_DYNPTR_NR_SLOTS))
 		return -EINVAL;
 
+	refcounted = dynptr_type_refcounted(state, spi);
+
 	for (i = 0; i < BPF_REG_SIZE; i++) {
 		state->stack[spi].slot_type[i] = STACK_INVALID;
 		state->stack[spi - 1].slot_type[i] = STACK_INVALID;
@@ -743,6 +770,15 @@ static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_re
 	state->stack[spi].spilled_ptr.dynptr_first_slot = 0;
 	state->stack[spi - 1].spilled_ptr.dynptr_type = 0;
 
+	/* Invalidate any slices associated with this dynptr */
+	if (refcounted) {
+		for (i = 0; i <= vstate->curframe; i++)
+			release_reg_references(env, vstate->frame[i],
+					       state->stack[spi].spilled_ptr.id);
+		state->stack[spi].spilled_ptr.id = 0;
+		state->stack[spi - 1].spilled_ptr.id = 0;
+	}
+
 	return 0;
 }
 
@@ -780,6 +816,19 @@ static bool check_dynptr_init(struct bpf_verifier_env *env, struct bpf_reg_state
 	return state->stack[spi].spilled_ptr.dynptr_type == expected_type;
 }
 
+static bool is_ref_obj_id_dynptr(struct bpf_func_state *state, u32 id)
+{
+	int i;
+
+	for (i = 0; i < state->allocated_stack / BPF_REG_SIZE; i++) {
+		if (state->stack[i].slot_type[0] == STACK_DYNPTR &&
+		    state->stack[i].spilled_ptr.id == id)
+			return true;
+	}
+
+	return false;
+}
+
 static bool stack_access_into_dynptr(struct bpf_func_state *state, int spi, int size)
 {
 	int nr_slots, i;
@@ -5585,6 +5634,14 @@ static bool id_in_stack_slot(enum bpf_arg_type arg_type)
 	return arg_type_is_dynptr(arg_type);
 }
 
+static inline u32 stack_slot_get_id(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+{
+	struct bpf_func_state *state = func(env, reg);
+	int spi = get_spi(reg->off);
+
+	return state->stack[spi].spilled_ptr.id;
+}
+
 static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			  struct bpf_call_arg_meta *meta,
 			  const struct bpf_func_proto *fn)
@@ -7114,6 +7171,14 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		regs[BPF_REG_0].id = id;
 		/* For release_reference() */
 		regs[BPF_REG_0].ref_obj_id = id;
+	} else if (is_dynptr_ref_function(func_id)) {
+		/* Retrieve the id of the associated dynptr. */
+		int id = stack_slot_get_id(env, &regs[BPF_REG_1]);
+
+		if (id < 0)
+			return id;
+		regs[BPF_REG_0].id = id;
+		regs[BPF_REG_0].ref_obj_id = id;
 	}
 
 	do_refine_retval_range(regs, fn->ret_type, func_id, &meta);
@@ -9545,7 +9610,8 @@ static void mark_ptr_or_null_regs(struct bpf_verifier_state *vstate, u32 regno,
 	u32 id = regs[regno].id;
 	int i;
 
-	if (ref_obj_id && ref_obj_id == id && is_null)
+	if (ref_obj_id && ref_obj_id == id && is_null &&
+	    !is_ref_obj_id_dynptr(state, id))
 		/* regs[regno] is in the " == NULL" branch.
 		 * No one could have freed the reference state before
 		 * doing the NULL check.
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 16a35e46be90..c835e437cb28 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5191,6 +5191,17 @@ union bpf_attr {
  *	Return
  *		0 on success, -EINVAL if *offset* + *len* exceeds the length
  *		of *dst*'s data or if *dst* is not writeable.
+ *
+ * void *bpf_dynptr_data(struct bpf_dynptr *ptr, u32 offset, u32 len)
+ *	Description
+ *		Get a pointer to the underlying dynptr data.
+ *
+ *		*len* must be a statically known value. The returned data slice
+ *		is invalidated whenever the dynptr is invalidated.
+ *	Return
+ *		Pointer to the underlying dynptr data, NULL if the ptr is
+ *		read-only, if the dynptr is invalid, or if the offset and length
+ *		is out of bounds.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5392,6 +5403,7 @@ union bpf_attr {
 	FN(free),			\
 	FN(dynptr_read),		\
 	FN(dynptr_write),		\
+	FN(dynptr_data),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next v1 6/7] bpf: Dynptr support for ring buffers
  2022-04-02  1:58 [PATCH bpf-next v1 0/7] Dynamic pointers Joanne Koong
                   ` (4 preceding siblings ...)
  2022-04-02  1:58 ` [PATCH bpf-next v1 5/7] bpf: Add dynptr data slices Joanne Koong
@ 2022-04-02  1:58 ` Joanne Koong
  2022-04-02  6:40   ` kernel test robot
  2022-04-06 22:50   ` Andrii Nakryiko
  2022-04-02  1:58 ` [PATCH bpf-next v1 7/7] bpf: Dynptr tests Joanne Koong
  2022-04-06 23:13 ` [PATCH bpf-next v1 0/7] Dynamic pointers Andrii Nakryiko
  7 siblings, 2 replies; 32+ messages in thread
From: Joanne Koong @ 2022-04-02  1:58 UTC (permalink / raw)
  To: bpf; +Cc: andrii, ast, daniel, Joanne Koong

From: Joanne Koong <joannelkoong@gmail.com>

Currently, our only way of writing dynamically-sized data into a ring
buffer is through bpf_ringbuf_output but this incurs an extra memcpy
cost. bpf_ringbuf_reserve + bpf_ringbuf_commit avoids this extra
memcpy, but it can only safely support reservation sizes that are
statically known since the verifier cannot guarantee that the bpf
program won’t access memory outside the reserved space.

The bpf_dynptr abstraction allows for dynamically-sized ring buffer
reservations without the extra memcpy.

There are 3 new APIs:

long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr);
void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags);
void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags);

These closely follow the functionalities of the original ringbuf APIs.
For example, all ringbuffer dynptrs that have been reserved must be
either submitted or discarded before the program exits.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/linux/bpf.h            | 10 ++++-
 include/uapi/linux/bpf.h       | 30 ++++++++++++++
 kernel/bpf/helpers.c           |  6 +++
 kernel/bpf/ringbuf.c           | 71 ++++++++++++++++++++++++++++++++++
 kernel/bpf/verifier.c          | 17 ++++++--
 tools/include/uapi/linux/bpf.h | 30 ++++++++++++++
 6 files changed, 160 insertions(+), 4 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index cded9753fb7f..2672360172c5 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -352,7 +352,10 @@ enum bpf_type_flag {
 	/* DYNPTR points to dynamically allocated memory. */
 	DYNPTR_TYPE_MALLOC	= BIT(8 + BPF_BASE_TYPE_BITS),
 
-	__BPF_TYPE_LAST_FLAG	= DYNPTR_TYPE_MALLOC,
+	/* DYNPTR points to a ringbuf record. */
+	DYNPTR_TYPE_RINGBUF	= BIT(9 + BPF_BASE_TYPE_BITS),
+
+	__BPF_TYPE_LAST_FLAG	= DYNPTR_TYPE_RINGBUF,
 };
 
 /* Max number of base types. */
@@ -2255,6 +2258,9 @@ extern const struct bpf_func_proto bpf_ringbuf_reserve_proto;
 extern const struct bpf_func_proto bpf_ringbuf_submit_proto;
 extern const struct bpf_func_proto bpf_ringbuf_discard_proto;
 extern const struct bpf_func_proto bpf_ringbuf_query_proto;
+extern const struct bpf_func_proto bpf_ringbuf_reserve_dynptr_proto;
+extern const struct bpf_func_proto bpf_ringbuf_submit_dynptr_proto;
+extern const struct bpf_func_proto bpf_ringbuf_discard_dynptr_proto;
 extern const struct bpf_func_proto bpf_skc_to_tcp6_sock_proto;
 extern const struct bpf_func_proto bpf_skc_to_tcp_sock_proto;
 extern const struct bpf_func_proto bpf_skc_to_tcp_timewait_sock_proto;
@@ -2418,6 +2424,8 @@ enum bpf_dynptr_type {
 	BPF_DYNPTR_TYPE_LOCAL,
 	/* Memory allocated dynamically by the kernel for the dynptr */
 	BPF_DYNPTR_TYPE_MALLOC,
+	/* Underlying data is a ringbuf record */
+	BPF_DYNPTR_TYPE_RINGBUF,
 };
 
 /* The upper 4 bits of dynptr->size are reserved. Consequently, the
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c835e437cb28..778de0b052c1 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5202,6 +5202,33 @@ union bpf_attr {
  *		Pointer to the underlying dynptr data, NULL if the ptr is
  *		read-only, if the dynptr is invalid, or if the offset and length
  *		is out of bounds.
+ *
+ * long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr)
+ *	Description
+ *		Reserve *size* bytes of payload in a ring buffer *ringbuf*
+ *		through the dynptr interface. *flags* must be 0.
+ *	Return
+ *		0 on success, or a negative error in case of failure.
+ *
+ * void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags)
+ *	Description
+ *		Submit reserved ring buffer sample, pointed to by *data*,
+ *		through the dynptr interface.
+ *
+ *		For more information on *flags*, please see
+ *		'bpf_ringbuf_submit'.
+ *	Return
+ *		Nothing. Always succeeds.
+ *
+ * void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags)
+ *	Description
+ *		Discard reserved ring buffer sample through the dynptr
+ *		interface.
+ *
+ *		For more information on *flags*, please see
+ *		'bpf_ringbuf_discard'.
+ *	Return
+ *		Nothing. Always succeeds.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5404,6 +5431,9 @@ union bpf_attr {
 	FN(dynptr_read),		\
 	FN(dynptr_write),		\
 	FN(dynptr_data),		\
+	FN(ringbuf_reserve_dynptr),	\
+	FN(ringbuf_submit_dynptr),	\
+	FN(ringbuf_discard_dynptr),	\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index c1295fb5d9d4..7b1e467f0504 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1580,6 +1580,12 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		return &bpf_ringbuf_discard_proto;
 	case BPF_FUNC_ringbuf_query:
 		return &bpf_ringbuf_query_proto;
+	case BPF_FUNC_ringbuf_reserve_dynptr:
+		return &bpf_ringbuf_reserve_dynptr_proto;
+	case BPF_FUNC_ringbuf_submit_dynptr:
+		return &bpf_ringbuf_submit_dynptr_proto;
+	case BPF_FUNC_ringbuf_discard_dynptr:
+		return &bpf_ringbuf_discard_dynptr_proto;
 	case BPF_FUNC_for_each_map_elem:
 		return &bpf_for_each_map_elem_proto;
 	case BPF_FUNC_loop:
diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index a723aa484ce4..cdbeeb4819ae 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -475,3 +475,74 @@ const struct bpf_func_proto bpf_ringbuf_query_proto = {
 	.arg1_type	= ARG_CONST_MAP_PTR,
 	.arg2_type	= ARG_ANYTHING,
 };
+
+BPF_CALL_4(bpf_ringbuf_reserve_dynptr, struct bpf_map *, map, u32, size, u64, flags,
+	   struct bpf_dynptr_kern *, ptr)
+{
+	void *sample;
+	int err;
+
+	err = bpf_dynptr_check_size(size);
+	if (err) {
+		bpf_dynptr_set_null(ptr);
+		return err;
+	}
+
+	sample = (void *)____bpf_ringbuf_reserve(map, size, flags);
+
+	if (!sample) {
+		bpf_dynptr_set_null(ptr);
+		return -EINVAL;
+	}
+
+	bpf_dynptr_init(ptr, sample, BPF_DYNPTR_TYPE_RINGBUF, 0, size);
+
+	return 0;
+}
+
+const struct bpf_func_proto bpf_ringbuf_reserve_dynptr_proto = {
+	.func		= bpf_ringbuf_reserve_dynptr,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_CONST_MAP_PTR,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_ANYTHING,
+	.arg4_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_RINGBUF | MEM_UNINIT,
+};
+
+BPF_CALL_2(bpf_ringbuf_submit_dynptr, struct bpf_dynptr_kern *, ptr, u64, flags)
+{
+	if (!ptr->data)
+		return 0;
+
+	____bpf_ringbuf_submit(ptr->data, flags);
+
+	bpf_dynptr_set_null(ptr);
+
+	return 0;
+}
+
+const struct bpf_func_proto bpf_ringbuf_submit_dynptr_proto = {
+	.func		= bpf_ringbuf_submit_dynptr,
+	.ret_type	= RET_VOID,
+	.arg1_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_RINGBUF | MEM_RELEASE,
+	.arg2_type	= ARG_ANYTHING,
+};
+
+BPF_CALL_2(bpf_ringbuf_discard_dynptr, struct bpf_dynptr_kern *, ptr, u64, flags)
+{
+	if (!ptr->data)
+		return 0;
+
+	____bpf_ringbuf_discard(ptr->data, flags);
+
+	bpf_dynptr_set_null(ptr);
+
+	return 0;
+}
+
+const struct bpf_func_proto bpf_ringbuf_discard_dynptr_proto = {
+	.func		= bpf_ringbuf_discard_dynptr,
+	.ret_type	= RET_VOID,
+	.arg1_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_RINGBUF | MEM_RELEASE,
+	.arg2_type	= ARG_ANYTHING,
+};
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 7352ffb4f9a5..6336476eac7d 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -679,7 +679,7 @@ static void mark_verifier_state_scratched(struct bpf_verifier_env *env)
 
 static int arg_to_dynptr_type(enum bpf_arg_type arg_type, enum bpf_dynptr_type *dynptr_type)
 {
-	int type = arg_type & (DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_MALLOC);
+	int type = arg_type & (DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_MALLOC | DYNPTR_TYPE_RINGBUF);
 
 	switch (type) {
 	case DYNPTR_TYPE_LOCAL:
@@ -688,6 +688,9 @@ static int arg_to_dynptr_type(enum bpf_arg_type arg_type, enum bpf_dynptr_type *
 	case DYNPTR_TYPE_MALLOC:
 		*dynptr_type = BPF_DYNPTR_TYPE_MALLOC;
 		break;
+	case DYNPTR_TYPE_RINGBUF:
+		*dynptr_type = BPF_DYNPTR_TYPE_RINGBUF;
+		break;
 	default:
 		/* Can't have more than one type set and can't have no
 		 * type set
@@ -702,7 +705,7 @@ static bool dynptr_type_refcounted(struct bpf_func_state *state, int spi)
 {
 	enum bpf_dynptr_type type = state->stack[spi].spilled_ptr.dynptr_type;
 
-	return type == BPF_DYNPTR_TYPE_MALLOC;
+	return type == BPF_DYNPTR_TYPE_MALLOC || type == BPF_DYNPTR_TYPE_RINGBUF;
 }
 
 static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
@@ -5842,6 +5845,8 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 					err_extra = "local ";
 				else if (arg_type & DYNPTR_TYPE_MALLOC)
 					err_extra = "malloc ";
+				else if (arg_type & DYNPTR_TYPE_RINGBUF)
+					err_extra = "ringbuf ";
 				verbose(env, "Expected an initialized %sdynptr as arg #%d\n",
 					err_extra, arg + 1);
 				return -EINVAL;
@@ -5966,7 +5971,10 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 	case BPF_MAP_TYPE_RINGBUF:
 		if (func_id != BPF_FUNC_ringbuf_output &&
 		    func_id != BPF_FUNC_ringbuf_reserve &&
-		    func_id != BPF_FUNC_ringbuf_query)
+		    func_id != BPF_FUNC_ringbuf_query &&
+		    func_id != BPF_FUNC_ringbuf_reserve_dynptr &&
+		    func_id != BPF_FUNC_ringbuf_submit_dynptr &&
+		    func_id != BPF_FUNC_ringbuf_discard_dynptr)
 			goto error;
 		break;
 	case BPF_MAP_TYPE_STACK_TRACE:
@@ -6082,6 +6090,9 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 	case BPF_FUNC_ringbuf_output:
 	case BPF_FUNC_ringbuf_reserve:
 	case BPF_FUNC_ringbuf_query:
+	case BPF_FUNC_ringbuf_reserve_dynptr:
+	case BPF_FUNC_ringbuf_submit_dynptr:
+	case BPF_FUNC_ringbuf_discard_dynptr:
 		if (map->map_type != BPF_MAP_TYPE_RINGBUF)
 			goto error;
 		break;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index c835e437cb28..778de0b052c1 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5202,6 +5202,33 @@ union bpf_attr {
  *		Pointer to the underlying dynptr data, NULL if the ptr is
  *		read-only, if the dynptr is invalid, or if the offset and length
  *		is out of bounds.
+ *
+ * long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr)
+ *	Description
+ *		Reserve *size* bytes of payload in a ring buffer *ringbuf*
+ *		through the dynptr interface. *flags* must be 0.
+ *	Return
+ *		0 on success, or a negative error in case of failure.
+ *
+ * void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags)
+ *	Description
+ *		Submit reserved ring buffer sample, pointed to by *data*,
+ *		through the dynptr interface.
+ *
+ *		For more information on *flags*, please see
+ *		'bpf_ringbuf_submit'.
+ *	Return
+ *		Nothing. Always succeeds.
+ *
+ * void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags)
+ *	Description
+ *		Discard reserved ring buffer sample through the dynptr
+ *		interface.
+ *
+ *		For more information on *flags*, please see
+ *		'bpf_ringbuf_discard'.
+ *	Return
+ *		Nothing. Always succeeds.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5404,6 +5431,9 @@ union bpf_attr {
 	FN(dynptr_read),		\
 	FN(dynptr_write),		\
 	FN(dynptr_data),		\
+	FN(ringbuf_reserve_dynptr),	\
+	FN(ringbuf_submit_dynptr),	\
+	FN(ringbuf_discard_dynptr),	\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next v1 7/7] bpf: Dynptr tests
  2022-04-02  1:58 [PATCH bpf-next v1 0/7] Dynamic pointers Joanne Koong
                   ` (5 preceding siblings ...)
  2022-04-02  1:58 ` [PATCH bpf-next v1 6/7] bpf: Dynptr support for ring buffers Joanne Koong
@ 2022-04-02  1:58 ` Joanne Koong
  2022-04-06 23:11   ` Andrii Nakryiko
  2022-04-06 23:13 ` [PATCH bpf-next v1 0/7] Dynamic pointers Andrii Nakryiko
  7 siblings, 1 reply; 32+ messages in thread
From: Joanne Koong @ 2022-04-02  1:58 UTC (permalink / raw)
  To: bpf; +Cc: andrii, ast, daniel, Joanne Koong

From: Joanne Koong <joannelkoong@gmail.com>

This patch adds tests for dynptrs. These include scenarios that the
verifier needs to reject, as well as some successful use cases of
dynptrs that should pass.

Some of the failure scenarios include checking against invalid bpf_frees,
invalid writes, invalid reads, and invalid ringbuf API usages.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 .../testing/selftests/bpf/prog_tests/dynptr.c | 303 ++++++++++
 .../testing/selftests/bpf/progs/dynptr_fail.c | 527 ++++++++++++++++++
 .../selftests/bpf/progs/dynptr_success.c      | 147 +++++
 3 files changed, 977 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/dynptr.c
 create mode 100644 tools/testing/selftests/bpf/progs/dynptr_fail.c
 create mode 100644 tools/testing/selftests/bpf/progs/dynptr_success.c

diff --git a/tools/testing/selftests/bpf/prog_tests/dynptr.c b/tools/testing/selftests/bpf/prog_tests/dynptr.c
new file mode 100644
index 000000000000..7107ebee3427
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/dynptr.c
@@ -0,0 +1,303 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Facebook */
+
+#include <test_progs.h>
+#include "dynptr_fail.skel.h"
+#include "dynptr_success.skel.h"
+
+size_t log_buf_sz = 1024 * 1024;
+
+enum fail_case {
+	MISSING_FREE,
+	MISSING_FREE_CALLBACK,
+	INVALID_FREE1,
+	INVALID_FREE2,
+	USE_AFTER_FREE,
+	MALLOC_TWICE,
+	INVALID_MAP_CALL1,
+	INVALID_MAP_CALL2,
+	RINGBUF_INVALID_ACCESS,
+	RINGBUF_INVALID_API,
+	RINGBUF_OUT_OF_BOUNDS,
+	DATA_SLICE_OUT_OF_BOUNDS,
+	DATA_SLICE_USE_AFTER_FREE,
+	INVALID_HELPER1,
+	INVALID_HELPER2,
+	INVALID_WRITE1,
+	INVALID_WRITE2,
+	INVALID_WRITE3,
+	INVALID_WRITE4,
+	INVALID_READ1,
+	INVALID_READ2,
+	INVALID_READ3,
+	INVALID_OFFSET,
+	GLOBAL,
+	FREE_TWICE,
+	FREE_TWICE_CALLBACK,
+};
+
+static void verify_fail(enum fail_case fail, char *obj_log_buf,  char *err_msg)
+{
+	LIBBPF_OPTS(bpf_object_open_opts, opts);
+	struct bpf_program *prog;
+	struct dynptr_fail *skel;
+	int err;
+
+	opts.kernel_log_buf = obj_log_buf;
+	opts.kernel_log_size = log_buf_sz;
+	opts.kernel_log_level = 1;
+
+	skel = dynptr_fail__open_opts(&opts);
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		return;
+
+	bpf_object__for_each_program(prog, skel->obj)
+		bpf_program__set_autoload(prog, false);
+
+	/* these programs should all be rejected by the verifier */
+	switch (fail) {
+	case MISSING_FREE:
+		prog = skel->progs.missing_free;
+		break;
+	case MISSING_FREE_CALLBACK:
+		prog = skel->progs.missing_free_callback;
+		break;
+	case INVALID_FREE1:
+		prog = skel->progs.invalid_free1;
+		break;
+	case INVALID_FREE2:
+		prog = skel->progs.invalid_free2;
+		break;
+	case USE_AFTER_FREE:
+		prog = skel->progs.use_after_free;
+		break;
+	case MALLOC_TWICE:
+		prog = skel->progs.malloc_twice;
+		break;
+	case INVALID_MAP_CALL1:
+		prog = skel->progs.invalid_map_call1;
+		break;
+	case INVALID_MAP_CALL2:
+		prog = skel->progs.invalid_map_call2;
+		break;
+	case RINGBUF_INVALID_ACCESS:
+		prog = skel->progs.ringbuf_invalid_access;
+		break;
+	case RINGBUF_INVALID_API:
+		prog = skel->progs.ringbuf_invalid_api;
+		break;
+	case RINGBUF_OUT_OF_BOUNDS:
+		prog = skel->progs.ringbuf_out_of_bounds;
+		break;
+	case DATA_SLICE_OUT_OF_BOUNDS:
+		prog = skel->progs.data_slice_out_of_bounds;
+		break;
+	case DATA_SLICE_USE_AFTER_FREE:
+		prog = skel->progs.data_slice_use_after_free;
+		break;
+	case INVALID_HELPER1:
+		prog = skel->progs.invalid_helper1;
+		break;
+	case INVALID_HELPER2:
+		prog = skel->progs.invalid_helper2;
+		break;
+	case INVALID_WRITE1:
+		prog = skel->progs.invalid_write1;
+		break;
+	case INVALID_WRITE2:
+		prog = skel->progs.invalid_write2;
+		break;
+	case INVALID_WRITE3:
+		prog = skel->progs.invalid_write3;
+		break;
+	case INVALID_WRITE4:
+		prog = skel->progs.invalid_write4;
+		break;
+	case INVALID_READ1:
+		prog = skel->progs.invalid_read1;
+		break;
+	case INVALID_READ2:
+		prog = skel->progs.invalid_read2;
+		break;
+	case INVALID_READ3:
+		prog = skel->progs.invalid_read3;
+		break;
+	case INVALID_OFFSET:
+		prog = skel->progs.invalid_offset;
+		break;
+	case GLOBAL:
+		prog = skel->progs.global;
+		break;
+	case FREE_TWICE:
+		prog = skel->progs.free_twice;
+		break;
+	case FREE_TWICE_CALLBACK:
+		prog = skel->progs.free_twice_callback;
+		break;
+	default:
+		fprintf(stderr, "unknown fail_case\n");
+		return;
+	}
+
+	bpf_program__set_autoload(prog, true);
+
+	err = dynptr_fail__load(skel);
+
+	ASSERT_OK_PTR(strstr(obj_log_buf, err_msg), "err_msg not found");
+
+	ASSERT_ERR(err, "unexpected load success");
+
+	dynptr_fail__destroy(skel);
+}
+
+static void run_prog(struct dynptr_success *skel, struct bpf_program *prog)
+{
+	struct bpf_link *link;
+
+	link = bpf_program__attach(prog);
+	if (!ASSERT_OK_PTR(link, "bpf program attach"))
+		return;
+
+	usleep(1);
+
+	ASSERT_EQ(skel->bss->err, 0, "err");
+
+	bpf_link__destroy(link);
+}
+
+static void verify_success(void)
+{
+	struct dynptr_success *skel;
+
+	skel = dynptr_success__open();
+
+	skel->bss->pid = getpid();
+
+	dynptr_success__load(skel);
+	if (!ASSERT_OK_PTR(skel, "dynptr__open_and_load"))
+		return;
+
+	run_prog(skel, skel->progs.prog_success);
+	run_prog(skel, skel->progs.prog_success_data_slice);
+	run_prog(skel, skel->progs.prog_success_ringbuf);
+
+	dynptr_success__destroy(skel);
+}
+
+void test_dynptr(void)
+{
+	char *obj_log_buf;
+
+	obj_log_buf = malloc(3 * log_buf_sz);
+	if (!ASSERT_OK_PTR(obj_log_buf, "obj_log_buf"))
+		return;
+	obj_log_buf[0] = '\0';
+
+	if (test__start_subtest("missing_free"))
+		verify_fail(MISSING_FREE, obj_log_buf,
+			    "spi=0 is an unreleased dynptr");
+
+	if (test__start_subtest("missing_free_callback"))
+		verify_fail(MISSING_FREE_CALLBACK, obj_log_buf,
+			    "spi=0 is an unreleased dynptr");
+
+	if (test__start_subtest("invalid_free1"))
+		verify_fail(INVALID_FREE1, obj_log_buf,
+			    "arg #1 is an unacquired reference and hence cannot be released");
+
+	if (test__start_subtest("invalid_free2"))
+		verify_fail(INVALID_FREE2, obj_log_buf, "type=alloc_mem_or_null expected=fp");
+
+	if (test__start_subtest("use_after_free"))
+		verify_fail(USE_AFTER_FREE, obj_log_buf,
+			    "Expected an initialized dynptr as arg #3");
+
+	if (test__start_subtest("malloc_twice"))
+		verify_fail(MALLOC_TWICE, obj_log_buf,
+			    "Arg #2 dynptr cannot be an initialized dynptr");
+
+	if (test__start_subtest("invalid_map_call1"))
+		verify_fail(INVALID_MAP_CALL1, obj_log_buf,
+			    "invalid indirect read from stack");
+
+	if (test__start_subtest("invalid_map_call2"))
+		verify_fail(INVALID_MAP_CALL2, obj_log_buf,
+			    "invalid indirect read from stack");
+
+	if (test__start_subtest("invalid_helper1"))
+		verify_fail(INVALID_HELPER1, obj_log_buf,
+			    "invalid indirect read from stack");
+
+	if (test__start_subtest("ringbuf_invalid_access"))
+		verify_fail(RINGBUF_INVALID_ACCESS, obj_log_buf,
+			    "invalid mem access 'scalar'");
+
+	if (test__start_subtest("ringbuf_invalid_api"))
+		verify_fail(RINGBUF_INVALID_API, obj_log_buf,
+			    "func bpf_ringbuf_submit#132 reference has not been acquired before");
+
+	if (test__start_subtest("ringbuf_out_of_bounds"))
+		verify_fail(RINGBUF_OUT_OF_BOUNDS, obj_log_buf,
+			    "value is outside of the allowed memory range");
+
+	if (test__start_subtest("data_slice_out_of_bounds"))
+		verify_fail(DATA_SLICE_OUT_OF_BOUNDS, obj_log_buf,
+			    "value is outside of the allowed memory range");
+
+	if (test__start_subtest("data_slice_use_after_free"))
+		verify_fail(DATA_SLICE_USE_AFTER_FREE, obj_log_buf,
+			    "invalid mem access 'scalar'");
+
+	if (test__start_subtest("invalid_helper2"))
+		verify_fail(INVALID_HELPER2, obj_log_buf,
+			    "Expected an initialized dynptr as arg #3");
+
+	if (test__start_subtest("invalid_write1"))
+		verify_fail(INVALID_WRITE1, obj_log_buf,
+			    "direct write into dynptr is not permitted");
+
+	if (test__start_subtest("invalid_write2"))
+		verify_fail(INVALID_WRITE2, obj_log_buf,
+			    "direct write into dynptr is not permitted");
+
+	if (test__start_subtest("invalid_write3"))
+		verify_fail(INVALID_WRITE3, obj_log_buf,
+			    "direct write into dynptr is not permitted");
+
+	if (test__start_subtest("invalid_write4"))
+		verify_fail(INVALID_WRITE4, obj_log_buf,
+			    "direct write into dynptr is not permitted");
+
+	if (test__start_subtest("invalid_read1"))
+		verify_fail(INVALID_READ1, obj_log_buf,
+			    "invalid read from stack");
+
+	if (test__start_subtest("invalid_read2"))
+		verify_fail(INVALID_READ2, obj_log_buf,
+			    "Expected an initialized dynptr as arg #3");
+
+	if (test__start_subtest("invalid_read3"))
+		verify_fail(INVALID_READ3, obj_log_buf,
+			    "invalid read from stack");
+
+	if (test__start_subtest("invalid_offset"))
+		verify_fail(INVALID_OFFSET, obj_log_buf,
+			    "invalid indirect access to stack");
+
+	if (test__start_subtest("global"))
+		verify_fail(GLOBAL, obj_log_buf,
+			    "R2 type=map_value expected=fp");
+
+	if (test__start_subtest("free_twice"))
+		verify_fail(FREE_TWICE, obj_log_buf,
+			    "arg #1 is an unacquired reference and hence cannot be released");
+
+	if (test__start_subtest("free_twice_callback"))
+		verify_fail(FREE_TWICE_CALLBACK, obj_log_buf,
+			    "arg #1 is an unacquired reference and hence cannot be released");
+
+	if (test__start_subtest("success"))
+		verify_success();
+
+	free(obj_log_buf);
+}
diff --git a/tools/testing/selftests/bpf/progs/dynptr_fail.c b/tools/testing/selftests/bpf/progs/dynptr_fail.c
new file mode 100644
index 000000000000..0b19eeb83e36
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/dynptr_fail.c
@@ -0,0 +1,527 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Facebook */
+
+#include <string.h>
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+char _license[] SEC("license") = "GPL";
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, __u32);
+	__type(value, struct bpf_dynptr);
+} array_map SEC(".maps");
+
+struct sample {
+	int pid;
+	long value;
+	char comm[16];
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_RINGBUF);
+	__uint(max_entries, 1 << 12);
+} ringbuf SEC(".maps");
+
+int err = 0;
+int val;
+
+/* A dynptr can't be used after bpf_free has been called on it */
+SEC("raw_tp/sys_nanosleep")
+int use_after_free(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	char read_data[64] = {};
+
+	bpf_malloc(8, &ptr);
+
+	bpf_dynptr_read(read_data, sizeof(read_data), &ptr, 0);
+
+	bpf_free(&ptr);
+
+	/* this should fail */
+	bpf_dynptr_read(read_data, sizeof(read_data), &ptr, 0);
+
+	return 0;
+}
+
+/* Every bpf_malloc call must have a corresponding bpf_free call */
+SEC("raw_tp/sys_nanosleep")
+int missing_free(void *ctx)
+{
+	struct bpf_dynptr mem;
+
+	bpf_malloc(8, &mem);
+
+	/* missing a call to bpf_free(&mem) */
+
+	return 0;
+}
+
+/* A non-malloc-ed dynptr can't be freed */
+SEC("raw_tp/sys_nanosleep")
+int invalid_free1(void *ctx)
+{
+	struct bpf_dynptr ptr;
+	__u32 x = 0;
+
+	bpf_dynptr_from_mem(&x, sizeof(x), &ptr);
+
+	/* this should fail */
+	bpf_free(&ptr);
+
+	return 0;
+}
+
+/* A data slice from a dynptr can't be freed */
+SEC("raw_tp/sys_nanosleep")
+int invalid_free2(void *ctx)
+{
+	struct bpf_dynptr ptr;
+	void *data;
+
+	bpf_malloc(8, &ptr);
+
+	data = bpf_dynptr_data(&ptr, 0, 8);
+
+	/* this should fail */
+	bpf_free(data);
+
+	return 0;
+}
+
+/*
+ * Can't bpf_malloc an existing malloc-ed bpf_dynptr that hasn't been
+ * freed yet
+ */
+SEC("raw_tp/sys_nanosleep")
+int malloc_twice(void *ctx)
+{
+	struct bpf_dynptr ptr;
+
+	bpf_malloc(8, &ptr);
+
+	/* this should fail */
+	bpf_malloc(2, &ptr);
+
+	bpf_free(&ptr);
+
+	return 0;
+}
+
+/*
+ * Can't access a ring buffer record after submit or discard has been called
+ * on the dynptr
+ */
+SEC("raw_tp/sys_nanosleep")
+int ringbuf_invalid_access(void *ctx)
+{
+	struct bpf_dynptr ptr;
+	struct sample *sample;
+
+	err = bpf_ringbuf_reserve_dynptr(&ringbuf, sizeof(*sample), 0, &ptr);
+	sample = bpf_dynptr_data(&ptr, 0, sizeof(*sample));
+	if (!sample)
+		goto done;
+
+	sample->pid = 123;
+
+	bpf_ringbuf_submit_dynptr(&ptr, 0);
+
+	/* this should fail */
+	err = sample->pid;
+
+	return 0;
+
+done:
+	bpf_ringbuf_discard_dynptr(&ptr, 0);
+	return 0;
+}
+
+/* Can't call non-dynptr ringbuf APIs on a dynptr ringbuf sample */
+SEC("raw_tp/sys_nanosleep")
+int ringbuf_invalid_api(void *ctx)
+{
+	struct bpf_dynptr ptr;
+	struct sample *sample;
+
+	err = bpf_ringbuf_reserve_dynptr(&ringbuf, sizeof(*sample), 0, &ptr);
+	sample = bpf_dynptr_data(&ptr, 0, sizeof(*sample));
+	if (!sample)
+		goto done;
+
+	sample->pid = 123;
+
+	/* invalid API use. need to use dynptr API to submit/discard */
+	bpf_ringbuf_submit(sample, 0);
+
+	return 0;
+
+done:
+	bpf_ringbuf_discard_dynptr(&ptr, 0);
+	return 0;
+}
+
+/* Can't access memory outside a ringbuf record range */
+SEC("raw_tp/sys_nanosleep")
+int ringbuf_out_of_bounds(void *ctx)
+{
+	struct bpf_dynptr ptr;
+	struct sample *sample;
+
+	err = bpf_ringbuf_reserve_dynptr(&ringbuf, sizeof(*sample), 0, &ptr);
+	sample = bpf_dynptr_data(&ptr, 0, sizeof(*sample));
+	if (!sample)
+		goto done;
+
+	/* Can't access beyond sample range */
+	*(__u8 *)((void *)sample + sizeof(*sample)) = 123;
+
+	bpf_ringbuf_submit_dynptr(&ptr, 0);
+
+	return 0;
+
+done:
+	bpf_ringbuf_discard_dynptr(&ptr, 0);
+	return 0;
+}
+
+/* Can't add a dynptr to a map */
+SEC("raw_tp/sys_nanosleep")
+int invalid_map_call1(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	char buf[64] = {};
+	int key = 0;
+
+	err = bpf_dynptr_from_mem(buf, sizeof(buf), &ptr);
+
+	/* this should fail */
+	bpf_map_update_elem(&array_map, &key, &ptr, 0);
+
+	return 0;
+}
+
+/* Can't add a struct with an embedded dynptr to a map */
+SEC("raw_tp/sys_nanosleep")
+int invalid_map_call2(void *ctx)
+{
+	struct info {
+		int x;
+		struct bpf_dynptr ptr;
+	};
+	struct info x;
+	int key = 0;
+
+	bpf_malloc(8, &x.ptr);
+
+	/* this should fail */
+	bpf_map_update_elem(&array_map, &key, &x, 0);
+
+	return 0;
+}
+
+/* Can't pass in a dynptr as an arg to a helper function that doesn't take in a
+ * dynptr argument
+ */
+SEC("raw_tp/sys_nanosleep")
+int invalid_helper1(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+
+	bpf_malloc(8, &ptr);
+
+	/* this should fail */
+	bpf_strncmp((const char *)&ptr, sizeof(ptr), "hello!");
+
+	bpf_free(&ptr);
+
+	return 0;
+}
+
+/* A dynptr can't be passed into a helper function at a non-zero offset */
+SEC("raw_tp/sys_nanosleep")
+int invalid_helper2(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	char read_data[64] = {};
+	__u64 x = 0;
+
+	bpf_dynptr_from_mem(&x, sizeof(x), &ptr);
+
+	/* this should fail */
+	bpf_dynptr_read(read_data, sizeof(read_data), (void *)&ptr + 8, 0);
+
+	return 0;
+}
+
+/* A data slice can't be accessed out of bounds */
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+int data_slice_out_of_bounds(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	void *data;
+
+	bpf_malloc(8, &ptr);
+
+	data = bpf_dynptr_data(&ptr, 0, 8);
+	if (!data)
+		goto done;
+
+	/* can't index out of bounds of the data slice */
+	val = *((char *)data + 8);
+
+done:
+	bpf_free(&ptr);
+	return 0;
+}
+
+/* A data slice can't be used after it's freed */
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+int data_slice_use_after_free(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	void *data;
+
+	bpf_malloc(8, &ptr);
+
+	data = bpf_dynptr_data(&ptr, 0, 8);
+	if (!data)
+		goto done;
+
+	bpf_free(&ptr);
+
+	/* this should fail */
+	val = *(__u8 *)data;
+
+done:
+	bpf_free(&ptr);
+	return 0;
+}
+
+/*
+ * A bpf_dynptr can't be written directly to by the bpf program,
+ * only through dynptr helper functions
+ */
+SEC("raw_tp/sys_nanosleep")
+int invalid_write1(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	__u8 x = 0;
+
+	bpf_malloc(8, &ptr);
+
+	/* this should fail */
+	memcpy(&ptr, &x, sizeof(x));
+
+	bpf_free(&ptr);
+
+	return 0;
+}
+
+/*
+ * A bpf_dynptr at a non-zero offset can't be written directly to by the bpf program,
+ * only through dynptr helper functions
+ */
+SEC("raw_tp/sys_nanosleep")
+int invalid_write2(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	char read_data[64] = {};
+	__u8 x = 0, y = 0;
+
+	bpf_dynptr_from_mem(&x, sizeof(x), &ptr);
+
+	/* this should fail */
+	memcpy((void *)&ptr, &y, sizeof(y));
+
+	bpf_dynptr_read(read_data, sizeof(read_data), &ptr, 0);
+
+	return 0;
+}
+
+/* A non-const write into a dynptr is not permitted */
+SEC("raw_tp/sys_nanosleep")
+int invalid_write3(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	char stack_buf[16];
+	unsigned long len;
+	__u8 x = 0;
+
+	bpf_malloc(8, &ptr);
+
+	memcpy(stack_buf, &val, sizeof(val));
+	len = stack_buf[0] & 0xf;
+
+	/* this should fail */
+	memcpy((void *)&ptr + len, &x, sizeof(x));
+
+	bpf_free(&ptr);
+
+	return 0;
+}
+
+static int invalid_write4_callback(__u32 index, void *data)
+{
+	/* this should fail */
+	*(__u32 *)data = 123;
+
+	bpf_free(data);
+
+	return 0;
+}
+
+/* An invalid write can't occur in a callback function */
+SEC("raw_tp/sys_nanosleep")
+int invalid_write4(void *ctx)
+{
+	struct bpf_dynptr ptr;
+	__u64 x = 0;
+
+	bpf_dynptr_from_mem(&x, sizeof(x), &ptr);
+
+	bpf_loop(10, invalid_write4_callback, &ptr, 0);
+
+	return 0;
+}
+
+/* A globally-defined bpf_dynptr can't be used (it must reside as a stack frame) */
+struct bpf_dynptr global_dynptr;
+SEC("raw_tp/sys_nanosleep")
+int global(void *ctx)
+{
+	/* this should fail */
+	bpf_malloc(4, &global_dynptr);
+
+	bpf_free(&global_dynptr);
+
+	return 0;
+}
+
+/* A direct read should fail */
+SEC("raw_tp/sys_nanosleep")
+int invalid_read1(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	__u32 x = 2;
+
+	bpf_dynptr_from_mem(&x, sizeof(x), &ptr);
+
+	/* this should fail */
+	val = *(int *)&ptr;
+
+	return 0;
+}
+
+/* A direct read at an offset should fail */
+SEC("raw_tp/sys_nanosleep")
+int invalid_read2(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	char read_data[64] = {};
+	__u64 x = 0;
+
+	bpf_dynptr_from_mem(&x, sizeof(x), &ptr);
+
+	/* this should fail */
+	bpf_dynptr_read(read_data, sizeof(read_data), (void *)&ptr + 1, 0);
+
+	return 0;
+}
+
+/* A direct read at an offset into the lower stack slot should fail */
+SEC("raw_tp/sys_nanosleep")
+int invalid_read3(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	struct bpf_dynptr ptr2 = {};
+	__u32 x = 2;
+
+	bpf_dynptr_from_mem(&x, sizeof(x), &ptr);
+	bpf_dynptr_from_mem(&x, sizeof(x), &ptr2);
+
+	/* this should fail */
+	memcpy(&val, (void *)&ptr + 8, sizeof(val));
+
+	return 0;
+}
+
+/* Calling bpf_dynptr_from_mem on an offset should fail */
+SEC("raw_tp/sys_nanosleep")
+int invalid_offset(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	__u64 x = 0;
+
+	/* this should fail */
+	bpf_dynptr_from_mem(&x, sizeof(x), &ptr + 1);
+
+	return 0;
+}
+
+/* A malloc can't be freed twice */
+SEC("raw_tp/sys_nanosleep")
+int free_twice(void *ctx)
+{
+	struct bpf_dynptr ptr;
+
+	bpf_malloc(8, &ptr);
+
+	bpf_free(&ptr);
+
+	/* this second free should fail */
+	bpf_free(&ptr);
+
+	return 0;
+}
+
+static int free_twice_callback_fn(__u32 index, void *data)
+{
+	/* this should fail */
+	bpf_free(data);
+	val = index;
+	return 0;
+}
+
+/* Test that freeing a malloc twice, where the 2nd free happens within a
+ * calback function, fails
+ */
+SEC("raw_tp/sys_nanosleep")
+int free_twice_callback(void *ctx)
+{
+	struct bpf_dynptr ptr;
+
+	bpf_malloc(8, &ptr);
+
+	bpf_free(&ptr);
+
+	bpf_loop(10, free_twice_callback_fn, &ptr, 0);
+
+	return 0;
+}
+
+static int missing_free_callback_fn(__u32 index, void *data)
+{
+	struct bpf_dynptr ptr;
+
+	bpf_malloc(8, &ptr);
+
+	val = index;
+
+	/* missing bpf_free(&ptr) */
+
+	return 0;
+}
+
+/* Any dynptr initialized within a callback must be freed */
+SEC("raw_tp/sys_nanosleep")
+int missing_free_callback(void *ctx)
+{
+	bpf_loop(10, missing_free_callback_fn, NULL, 0);
+	return 0;
+}
+
diff --git a/tools/testing/selftests/bpf/progs/dynptr_success.c b/tools/testing/selftests/bpf/progs/dynptr_success.c
new file mode 100644
index 000000000000..1b605bbc17f3
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/dynptr_success.c
@@ -0,0 +1,147 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Facebook */
+
+#include <string.h>
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+char _license[] SEC("license") = "GPL";
+
+int pid = 0;
+int err = 0;
+int val;
+
+struct sample {
+	int pid;
+	int seq;
+	long value;
+	char comm[16];
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_RINGBUF);
+	__uint(max_entries, 1 << 12);
+} ringbuf SEC(".maps");
+
+SEC("tp/syscalls/sys_enter_nanosleep")
+int prog_success(void *ctx)
+{
+	char buf[64] = {};
+	char write_data[64] = "hello there, world!!";
+	struct bpf_dynptr ptr = {}, mem = {};
+	__u8 mem_allocated = 0;
+	char read_data[64] = {};
+	__u32 val = 0;
+	void *data;
+	int i;
+
+	if (bpf_get_current_pid_tgid() >> 32 != pid)
+		return 0;
+
+	err = bpf_dynptr_from_mem(buf, sizeof(buf), &ptr);
+	if (err)
+		goto done;
+
+	/* Write data into the dynptr */
+	err = bpf_dynptr_write(&ptr, 0, write_data, sizeof(write_data));
+	if (err)
+		goto done;
+
+	/* Read the data that was written into the dynptr */
+	err = bpf_dynptr_read(read_data, sizeof(read_data), &ptr, 0);
+	if (err)
+		goto done;
+
+	/* Ensure the data we read matches the data we wrote */
+	for (i = 0; i < sizeof(read_data); i++) {
+		if (read_data[i] != write_data[i]) {
+			err = 1;
+			goto done;
+		}
+	}
+
+done:
+	if (mem_allocated)
+		bpf_free(&mem);
+	return 0;
+}
+
+SEC("tp/syscalls/sys_enter_nanosleep")
+int prog_success_data_slice(void *ctx)
+{
+	struct bpf_dynptr mem;
+	void *data;
+
+	if (bpf_get_current_pid_tgid() >> 32 != pid)
+		return 0;
+
+	err = bpf_malloc(16, &mem);
+	if (err)
+		goto done;
+
+	data = bpf_dynptr_data(&mem, 0, sizeof(__u32));
+	if (!data)
+		goto done;
+
+	*(__u32 *)data = 999;
+
+	err = bpf_probe_read_kernel(&val, sizeof(val), data);
+	if (err)
+		goto done;
+
+	if (val != *(__u32 *)data)
+		err = 2;
+
+done:
+	bpf_free(&mem);
+	return 0;
+}
+
+static int ringbuf_callback(__u32 index, void *data)
+{
+	struct sample *sample;
+
+	struct bpf_dynptr *ptr = (struct bpf_dynptr *)data;
+
+	sample = bpf_dynptr_data(ptr, 0, sizeof(*sample));
+	if (!sample)
+		return 0;
+
+	sample->pid += val;
+
+	return 0;
+}
+
+SEC("tp/syscalls/sys_enter_nanosleep")
+int prog_success_ringbuf(void *ctx)
+{
+	struct bpf_dynptr ptr;
+	void *data;
+	struct sample *sample;
+
+	if (bpf_get_current_pid_tgid() >> 32 != pid)
+		return 0;
+
+	/* check that you can reserve a dynamic size reservation */
+	err = bpf_ringbuf_reserve_dynptr(&ringbuf, val, 0, &ptr);
+	if (err)
+		goto done;
+
+	sample = bpf_dynptr_data(&ptr, 0, sizeof(*sample));
+	if (!sample)
+		goto done;
+
+	sample->pid = 123;
+
+	/* Can pass dynptr to callback functions */
+	bpf_loop(10, ringbuf_callback, &ptr, 0);
+
+	bpf_ringbuf_submit_dynptr(&ptr, 0);
+
+	return 0;
+
+done:
+	bpf_ringbuf_discard_dynptr(&ptr, 0);
+	return 0;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v1 6/7] bpf: Dynptr support for ring buffers
  2022-04-02  1:58 ` [PATCH bpf-next v1 6/7] bpf: Dynptr support for ring buffers Joanne Koong
@ 2022-04-02  6:40   ` kernel test robot
  2022-04-06 22:50   ` Andrii Nakryiko
  1 sibling, 0 replies; 32+ messages in thread
From: kernel test robot @ 2022-04-02  6:40 UTC (permalink / raw)
  To: Joanne Koong, bpf; +Cc: kbuild-all, andrii, ast, daniel, Joanne Koong

Hi Joanne,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Joanne-Koong/Dynamic-pointers/20220402-100110
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: powerpc-allmodconfig (https://download.01.org/0day-ci/archive/20220402/202204021459.6f2G1oTF-lkp@intel.com/config)
compiler: powerpc-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/64c5b9e2d2df7ff61dd8bd2e36a29ffff264e2ff
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Joanne-Koong/Dynamic-pointers/20220402-100110
        git checkout 64c5b9e2d2df7ff61dd8bd2e36a29ffff264e2ff
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=powerpc SHELL=/bin/bash kernel/bpf/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   kernel/bpf/ringbuf.c: In function '____bpf_ringbuf_reserve_dynptr':
>> kernel/bpf/ringbuf.c:491:18: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     491 |         sample = (void *)____bpf_ringbuf_reserve(map, size, flags);
         |                  ^


vim +491 kernel/bpf/ringbuf.c

   478	
   479	BPF_CALL_4(bpf_ringbuf_reserve_dynptr, struct bpf_map *, map, u32, size, u64, flags,
   480		   struct bpf_dynptr_kern *, ptr)
   481	{
   482		void *sample;
   483		int err;
   484	
   485		err = bpf_dynptr_check_size(size);
   486		if (err) {
   487			bpf_dynptr_set_null(ptr);
   488			return err;
   489		}
   490	
 > 491		sample = (void *)____bpf_ringbuf_reserve(map, size, flags);
   492	
   493		if (!sample) {
   494			bpf_dynptr_set_null(ptr);
   495			return -EINVAL;
   496		}
   497	
   498		bpf_dynptr_init(ptr, sample, BPF_DYNPTR_TYPE_RINGBUF, 0, size);
   499	
   500		return 0;
   501	}
   502	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v1 4/7] bpf: Add bpf_dynptr_read and bpf_dynptr_write
  2022-04-02  1:58 ` [PATCH bpf-next v1 4/7] bpf: Add bpf_dynptr_read and bpf_dynptr_write Joanne Koong
@ 2022-04-02 13:35   ` Toke Høiland-Jørgensen
  2022-04-04 20:18     ` Joanne Koong
  2022-04-06 22:32   ` Andrii Nakryiko
  1 sibling, 1 reply; 32+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-04-02 13:35 UTC (permalink / raw)
  To: Joanne Koong, bpf; +Cc: andrii, ast, daniel, Joanne Koong

Joanne Koong <joannekoong@fb.com> writes:

> From: Joanne Koong <joannelkoong@gmail.com>
>
> This patch adds two helper functions, bpf_dynptr_read and
> bpf_dynptr_write:
>
> long bpf_dynptr_read(void *dst, u32 len, struct bpf_dynptr *src, u32 offset);
>
> long bpf_dynptr_write(struct bpf_dynptr *dst, u32 offset, void *src, u32 len);
>
> The dynptr passed into these functions must be valid dynptrs that have
> been initialized.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>  include/linux/bpf.h            |  6 ++++
>  include/uapi/linux/bpf.h       | 18 +++++++++++
>  kernel/bpf/helpers.c           | 56 ++++++++++++++++++++++++++++++++++
>  tools/include/uapi/linux/bpf.h | 18 +++++++++++
>  4 files changed, 98 insertions(+)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index e0fcff9f2aee..cded9753fb7f 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -2426,6 +2426,12 @@ enum bpf_dynptr_type {
>  #define DYNPTR_MAX_SIZE	((1UL << 28) - 1)
>  #define DYNPTR_SIZE_MASK	0xFFFFFFF
>  #define DYNPTR_TYPE_SHIFT	29
> +#define DYNPTR_RDONLY_BIT	BIT(28)
> +
> +static inline bool bpf_dynptr_is_rdonly(struct bpf_dynptr_kern *ptr)
> +{
> +	return ptr->size & DYNPTR_RDONLY_BIT;
> +}
>  
>  static inline enum bpf_dynptr_type bpf_dynptr_get_type(struct bpf_dynptr_kern *ptr)
>  {
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 6a57d8a1b882..16a35e46be90 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5175,6 +5175,22 @@ union bpf_attr {
>   *		After this operation, *ptr* will be an invalidated dynptr.
>   *	Return
>   *		Void.
> + *
> + * long bpf_dynptr_read(void *dst, u32 len, struct bpf_dynptr *src, u32 offset)
> + *	Description
> + *		Read *len* bytes from *src* into *dst*, starting from *offset*
> + *		into *dst*.

nit: this should be "starting from *offset* into *src*, no? (same below)

-Toke


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v1 2/7] bpf: Add MEM_RELEASE as a bpf_type_flag
  2022-04-02  1:58 ` [PATCH bpf-next v1 2/7] bpf: Add MEM_RELEASE " Joanne Koong
@ 2022-04-04  7:34   ` Kumar Kartikeya Dwivedi
  2022-04-04 19:04     ` Joanne Koong
  2022-04-06 18:42   ` Andrii Nakryiko
  1 sibling, 1 reply; 32+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-04-04  7:34 UTC (permalink / raw)
  To: Joanne Koong; +Cc: bpf, andrii, ast, daniel, Joanne Koong

On Sat, Apr 02, 2022 at 07:28:21AM IST, Joanne Koong wrote:
> From: Joanne Koong <joannelkoong@gmail.com>
>
> Currently, we hardcode in the verifier which functions are release
> functions. We have no way of differentiating which argument is the one
> to be released (we assume it will always be the first argument).
>
> This patch adds MEM_RELEASE as a bpf_type_flag. This allows us to
> determine which argument in the function needs to be released, and
> removes having to hardcode a list of release functions into the
> verifier.
>
> Please note that currently, we only support one release argument in a
> helper function. In the future, if/when we need to support several
> release arguments within the function, MEM_RELEASE is necessary
> since there needs to be a way of differentiating which arguments are the
> release ones.
>
> In the near future, MEM_RELEASE will be used by dynptr helper functions
> such as bpf_free.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>  include/linux/bpf.h          |  4 +++-
>  include/linux/bpf_verifier.h |  3 +--
>  kernel/bpf/btf.c             |  3 ++-
>  kernel/bpf/ringbuf.c         |  4 ++--
>  kernel/bpf/verifier.c        | 42 ++++++++++++++++++------------------
>  net/core/filter.c            |  2 +-
>  6 files changed, 30 insertions(+), 28 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 6f2558da9d4a..cb9f42866cde 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -344,7 +344,9 @@ enum bpf_type_flag {
>
>  	MEM_UNINIT		= BIT(5 + BPF_BASE_TYPE_BITS),
>
> -	__BPF_TYPE_LAST_FLAG	= MEM_UNINIT,
> +	MEM_RELEASE		= BIT(6 + BPF_BASE_TYPE_BITS),
> +
> +	__BPF_TYPE_LAST_FLAG	= MEM_RELEASE,
>  };
>
>  /* Max number of base types. */
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index c1fc4af47f69..7a01adc9e13f 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -523,8 +523,7 @@ int check_ptr_off_reg(struct bpf_verifier_env *env,
>  		      const struct bpf_reg_state *reg, int regno);
>  int check_func_arg_reg_off(struct bpf_verifier_env *env,
>  			   const struct bpf_reg_state *reg, int regno,
> -			   enum bpf_arg_type arg_type,
> -			   bool is_release_func);
> +			   enum bpf_arg_type arg_type, bool arg_release);
>  int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
>  			     u32 regno);
>  int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 0918a39279f6..e5b765a84aec 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -5830,7 +5830,8 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
>  		ref_t = btf_type_skip_modifiers(btf, t->type, &ref_id);
>  		ref_tname = btf_name_by_offset(btf, ref_t->name_off);
>
> -		ret = check_func_arg_reg_off(env, reg, regno, ARG_DONTCARE, rel);
> +		ret = check_func_arg_reg_off(env, reg, regno, ARG_DONTCARE,
> +					     rel && reg->ref_obj_id);
>  		if (ret < 0)
>  			return ret;
>
> diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
> index 710ba9de12ce..a723aa484ce4 100644
> --- a/kernel/bpf/ringbuf.c
> +++ b/kernel/bpf/ringbuf.c
> @@ -404,7 +404,7 @@ BPF_CALL_2(bpf_ringbuf_submit, void *, sample, u64, flags)
>  const struct bpf_func_proto bpf_ringbuf_submit_proto = {
>  	.func		= bpf_ringbuf_submit,
>  	.ret_type	= RET_VOID,
> -	.arg1_type	= ARG_PTR_TO_ALLOC_MEM,
> +	.arg1_type	= ARG_PTR_TO_ALLOC_MEM | MEM_RELEASE,
>  	.arg2_type	= ARG_ANYTHING,
>  };
>
> @@ -417,7 +417,7 @@ BPF_CALL_2(bpf_ringbuf_discard, void *, sample, u64, flags)
>  const struct bpf_func_proto bpf_ringbuf_discard_proto = {
>  	.func		= bpf_ringbuf_discard,
>  	.ret_type	= RET_VOID,
> -	.arg1_type	= ARG_PTR_TO_ALLOC_MEM,
> +	.arg1_type	= ARG_PTR_TO_ALLOC_MEM | MEM_RELEASE,
>  	.arg2_type	= ARG_ANYTHING,
>  };
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 90280d5666be..80e53303713e 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -471,15 +471,12 @@ static bool type_may_be_null(u32 type)
>  	return type & PTR_MAYBE_NULL;
>  }
>
> -/* Determine whether the function releases some resources allocated by another
> - * function call. The first reference type argument will be assumed to be
> - * released by release_reference().
> +/* Determine whether the type releases some resources allocated by a
> + * previous function call.
>   */
> -static bool is_release_function(enum bpf_func_id func_id)
> +static bool type_is_release_mem(u32 type)
>  {
> -	return func_id == BPF_FUNC_sk_release ||
> -	       func_id == BPF_FUNC_ringbuf_submit ||
> -	       func_id == BPF_FUNC_ringbuf_discard;
> +	return type & MEM_RELEASE;
>  }
>
>  static bool may_be_acquire_function(enum bpf_func_id func_id)
> @@ -5364,13 +5361,11 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
>
>  int check_func_arg_reg_off(struct bpf_verifier_env *env,
>  			   const struct bpf_reg_state *reg, int regno,
> -			   enum bpf_arg_type arg_type,
> -			   bool is_release_func)
> +			   enum bpf_arg_type arg_type, bool arg_release)
>  {
> -	bool fixed_off_ok = false, release_reg;
> -	enum bpf_reg_type type = reg->type;
> +	bool fixed_off_ok = false;
>
> -	switch ((u32)type) {
> +	switch ((u32)reg->type) {
>  	case SCALAR_VALUE:
>  	/* Pointer types where reg offset is explicitly allowed: */
>  	case PTR_TO_PACKET:
> @@ -5393,18 +5388,15 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>  	 * fixed offset.
>  	 */
>  	case PTR_TO_BTF_ID:
> -		/* When referenced PTR_TO_BTF_ID is passed to release function,
> -		 * it's fixed offset must be 0. We rely on the property that
> -		 * only one referenced register can be passed to BPF helpers and
> -		 * kfuncs. In the other cases, fixed offset can be non-zero.
> +		/* If a referenced PTR_TO_BTF_ID will be released, its fixed offset
> +		 * must be 0.
>  		 */
> -		release_reg = is_release_func && reg->ref_obj_id;
> -		if (release_reg && reg->off) {
> +		if (arg_release && reg->off) {
>  			verbose(env, "R%d must have zero offset when passed to release func\n",
>  				regno);
>  			return -EINVAL;
>  		}
> -		/* For release_reg == true, fixed_off_ok must be false, but we
> +		/* For arg_release == true, fixed_off_ok must be false, but we
>  		 * already checked and rejected reg->off != 0 above, so set to
>  		 * true to allow fixed offset for all other cases.
>  		 */
> @@ -5424,6 +5416,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
>  	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
>  	enum bpf_arg_type arg_type = fn->arg_type[arg];
>  	enum bpf_reg_type type = reg->type;
> +	bool arg_release;
>  	int err = 0;
>
>  	if (arg_type == ARG_DONTCARE)
> @@ -5464,7 +5457,14 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
>  	if (err)
>  		return err;
>
> -	err = check_func_arg_reg_off(env, reg, regno, arg_type, is_release_function(meta->func_id));
> +	arg_release = type_is_release_mem(arg_type);
> +	if (arg_release && !reg->ref_obj_id) {
> +		verbose(env, "R%d arg #%d is an unacquired reference and hence cannot be released\n",
> +			regno, arg + 1);
> +		return -EINVAL;
> +	}
> +
> +	err = check_func_arg_reg_off(env, reg, regno, arg_type, arg_release);
>  	if (err)
>  		return err;
>
> @@ -6693,7 +6693,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>  			return err;
>  	}
>
> -	if (is_release_function(func_id)) {
> +	if (meta.ref_obj_id) {

The meta.ref_obj_id field is set unconditionally whenever we see a
reg->ref_obj_id, e.g. when we pass a refcounted argument to non-release
function. Wouldn't making this conditional only on meta.ref_obj_id lead to
release of that register now? Or did I miss some change above which prevents
this case?

To make things clear, I'm talking of this sequence:

p = acquire();
helper_foo(p);   // meta.ref_obj_id would be set, and p is released
release(p);	 // error, as p.ref_obj_id has no reference state

Besides, in my series this PTR_RELEASE / MEM_RELEASE tagging is only needed
because the release function can take a NULL pointer, so we need to know the
register of the argument to be released, and then make sure it is refcounted,
otherwise it must be NULL (and whether NULL is permitted or not is checked
earlier during argument checks). That doesn't seem to be true for bpf_free in
your series, as it can only take ARG_PTR_TO_DYNPTR (but maybe it should also
set PTR_MAYBE_NULL).

>  		err = release_reference(env, meta.ref_obj_id);
>  		if (err) {
>  			verbose(env, "func %s#%d reference has not been acquired before\n",
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 9aafec3a09ed..a935ce7a63bc 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -6621,7 +6621,7 @@ static const struct bpf_func_proto bpf_sk_release_proto = {
>  	.func		= bpf_sk_release,
>  	.gpl_only	= false,
>  	.ret_type	= RET_INTEGER,
> -	.arg1_type	= ARG_PTR_TO_BTF_ID_SOCK_COMMON,
> +	.arg1_type	= ARG_PTR_TO_BTF_ID_SOCK_COMMON | MEM_RELEASE,
>  };
>
>  BPF_CALL_5(bpf_xdp_sk_lookup_udp, struct xdp_buff *, ctx,
> --
> 2.30.2
>

--
Kartikeya

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v1 2/7] bpf: Add MEM_RELEASE as a bpf_type_flag
  2022-04-04  7:34   ` Kumar Kartikeya Dwivedi
@ 2022-04-04 19:04     ` Joanne Koong
  0 siblings, 0 replies; 32+ messages in thread
From: Joanne Koong @ 2022-04-04 19:04 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: Joanne Koong, bpf, Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann

On Mon, Apr 4, 2022 at 12:34 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Sat, Apr 02, 2022 at 07:28:21AM IST, Joanne Koong wrote:
> > From: Joanne Koong <joannelkoong@gmail.com>
> >
> > Currently, we hardcode in the verifier which functions are release
> > functions. We have no way of differentiating which argument is the one
> > to be released (we assume it will always be the first argument).
> >
> > This patch adds MEM_RELEASE as a bpf_type_flag. This allows us to
> > determine which argument in the function needs to be released, and
> > removes having to hardcode a list of release functions into the
> > verifier.
> >
> > Please note that currently, we only support one release argument in a
> > helper function. In the future, if/when we need to support several
> > release arguments within the function, MEM_RELEASE is necessary
> > since there needs to be a way of differentiating which arguments are the
> > release ones.
> >
> > In the near future, MEM_RELEASE will be used by dynptr helper functions
> > such as bpf_free.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
[...]
> > @@ -6693,7 +6693,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
> >                       return err;
> >       }
> >
> > -     if (is_release_function(func_id)) {
> > +     if (meta.ref_obj_id) {
>
> The meta.ref_obj_id field is set unconditionally whenever we see a
> reg->ref_obj_id, e.g. when we pass a refcounted argument to non-release
> function. Wouldn't making this conditional only on meta.ref_obj_id lead to
> release of that register now? Or did I miss some change above which prevents
> this case?
>
Yes, unfortunately you are right. This wouldn't work for the cases
where a refcounted arg is passed to a non-release function, since that
also sets the meta.ref_obj_id. Thanks for catching this!

> To make things clear, I'm talking of this sequence:
>
> p = acquire();
> helper_foo(p);   // meta.ref_obj_id would be set, and p is released
> release(p);      // error, as p.ref_obj_id has no reference state
>
> Besides, in my series this PTR_RELEASE / MEM_RELEASE tagging is only needed
> because the release function can take a NULL pointer, so we need to know the
> register of the argument to be released, and then make sure it is refcounted,
> otherwise it must be NULL (and whether NULL is permitted or not is checked
> earlier during argument checks). That doesn't seem to be true for bpf_free in
> your series, as it can only take ARG_PTR_TO_DYNPTR (but maybe it should also
> set PTR_MAYBE_NULL).
>
In the dynptr case, there will be several release-type functions (eg
bpf_free, bpf_ringbuf_discard, bpf_ringbuf_submit). The motivation
behind this patch was to have some way of signifying this instead of
having to specify in the verifier the particular functions. Please let
me know if this addresses your comment or if there's something in
between the lines in your reply that I'm missing


> >               err = release_reference(env, meta.ref_obj_id);
> >               if (err) {
> >                       verbose(env, "func %s#%d reference has not been acquired before\n",
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 9aafec3a09ed..a935ce7a63bc 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -6621,7 +6621,7 @@ static const struct bpf_func_proto bpf_sk_release_proto = {
> >       .func           = bpf_sk_release,
> >       .gpl_only       = false,
> >       .ret_type       = RET_INTEGER,
> > -     .arg1_type      = ARG_PTR_TO_BTF_ID_SOCK_COMMON,
> > +     .arg1_type      = ARG_PTR_TO_BTF_ID_SOCK_COMMON | MEM_RELEASE,
> >  };
> >
> >  BPF_CALL_5(bpf_xdp_sk_lookup_udp, struct xdp_buff *, ctx,
> > --
> > 2.30.2
> >
>
> --
> Kartikeya

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v1 4/7] bpf: Add bpf_dynptr_read and bpf_dynptr_write
  2022-04-02 13:35   ` Toke Høiland-Jørgensen
@ 2022-04-04 20:18     ` Joanne Koong
  0 siblings, 0 replies; 32+ messages in thread
From: Joanne Koong @ 2022-04-04 20:18 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Joanne Koong, bpf, Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann

On Sat, Apr 2, 2022 at 6:35 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Joanne Koong <joannekoong@fb.com> writes:
>
> > From: Joanne Koong <joannelkoong@gmail.com>
> >
> > This patch adds two helper functions, bpf_dynptr_read and
> > bpf_dynptr_write:
> >
> > long bpf_dynptr_read(void *dst, u32 len, struct bpf_dynptr *src, u32 offset);
> >
> > long bpf_dynptr_write(struct bpf_dynptr *dst, u32 offset, void *src, u32 len);
> >
> > The dynptr passed into these functions must be valid dynptrs that have
> > been initialized.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
> >  include/linux/bpf.h            |  6 ++++
> >  include/uapi/linux/bpf.h       | 18 +++++++++++
> >  kernel/bpf/helpers.c           | 56 ++++++++++++++++++++++++++++++++++
> >  tools/include/uapi/linux/bpf.h | 18 +++++++++++
> >  4 files changed, 98 insertions(+)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index e0fcff9f2aee..cded9753fb7f 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -2426,6 +2426,12 @@ enum bpf_dynptr_type {
> >  #define DYNPTR_MAX_SIZE      ((1UL << 28) - 1)
> >  #define DYNPTR_SIZE_MASK     0xFFFFFFF
> >  #define DYNPTR_TYPE_SHIFT    29
> > +#define DYNPTR_RDONLY_BIT    BIT(28)
> > +
> > +static inline bool bpf_dynptr_is_rdonly(struct bpf_dynptr_kern *ptr)
> > +{
> > +     return ptr->size & DYNPTR_RDONLY_BIT;
> > +}
> >
> >  static inline enum bpf_dynptr_type bpf_dynptr_get_type(struct bpf_dynptr_kern *ptr)
> >  {
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 6a57d8a1b882..16a35e46be90 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -5175,6 +5175,22 @@ union bpf_attr {
> >   *           After this operation, *ptr* will be an invalidated dynptr.
> >   *   Return
> >   *           Void.
> > + *
> > + * long bpf_dynptr_read(void *dst, u32 len, struct bpf_dynptr *src, u32 offset)
> > + *   Description
> > + *           Read *len* bytes from *src* into *dst*, starting from *offset*
> > + *           into *dst*.
>
> nit: this should be "starting from *offset* into *src*, no? (same below)
>
Yes, this should be "starting from *offset* into *src*". I will fix
this line in both places. Thanks!
> -Toke
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v1 1/7] bpf: Add MEM_UNINIT as a bpf_type_flag
  2022-04-02  1:58 ` [PATCH bpf-next v1 1/7] bpf: Add MEM_UNINIT as a bpf_type_flag Joanne Koong
@ 2022-04-06 18:33   ` Andrii Nakryiko
  0 siblings, 0 replies; 32+ messages in thread
From: Andrii Nakryiko @ 2022-04-06 18:33 UTC (permalink / raw)
  To: Joanne Koong
  Cc: bpf, Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann, Joanne Koong

On Fri, Apr 1, 2022 at 7:00 PM Joanne Koong <joannekoong@fb.com> wrote:
>
> From: Joanne Koong <joannelkoong@gmail.com>
>
> Instead of having uninitialized versions of arguments as separate
> bpf_arg_types (eg ARG_PTR_TO_UNINIT_MEM as the uninitialized version
> of ARG_PTR_TO_MEM), we can instead use MEM_UNINIT as a bpf_type_flag
> modifier to denote that the argument is uninitialized.
>
> Doing so cleans up some of the logic in the verifier. We no longer
> need to do two checks against an argument type (eg "if
> (base_type(arg_type) == ARG_PTR_TO_MEM || base_type(arg_type) ==
> ARG_PTR_TO_UNINIT_MEM)"), since uninitialized and initialized
> versions of the same argument type will now share the same base type.
>
> In the near future, MEM_UNINIT will be used by dynptr helper functions
> as well.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>  include/linux/bpf.h      | 19 +++++++++++--------
>  kernel/bpf/bpf_lsm.c     |  4 ++--
>  kernel/bpf/cgroup.c      |  4 ++--
>  kernel/bpf/helpers.c     | 12 ++++++------
>  kernel/bpf/stackmap.c    |  6 +++---
>  kernel/bpf/verifier.c    | 25 ++++++++++---------------
>  kernel/trace/bpf_trace.c | 20 ++++++++++----------
>  net/core/filter.c        | 26 +++++++++++++-------------
>  8 files changed, 57 insertions(+), 59 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index bdb5298735ce..6f2558da9d4a 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -342,7 +342,9 @@ enum bpf_type_flag {
>          */
>         MEM_PERCPU              = BIT(4 + BPF_BASE_TYPE_BITS),
>
> -       __BPF_TYPE_LAST_FLAG    = MEM_PERCPU,
> +       MEM_UNINIT              = BIT(5 + BPF_BASE_TYPE_BITS),
> +
> +       __BPF_TYPE_LAST_FLAG    = MEM_UNINIT,
>  };
>
>  /* Max number of base types. */
> @@ -361,16 +363,11 @@ enum bpf_arg_type {
>         ARG_CONST_MAP_PTR,      /* const argument used as pointer to bpf_map */
>         ARG_PTR_TO_MAP_KEY,     /* pointer to stack used as map key */
>         ARG_PTR_TO_MAP_VALUE,   /* pointer to stack used as map value */
> -       ARG_PTR_TO_UNINIT_MAP_VALUE,    /* pointer to valid memory used to store a map value */
>
> -       /* the following constraints used to prototype bpf_memcmp() and other
> -        * functions that access data on eBPF program stack
> +       /* Used to prototype bpf_memcmp() and other functions that access data
> +        * on eBPF program stack
>          */
>         ARG_PTR_TO_MEM,         /* pointer to valid memory (stack, packet, map value) */
> -       ARG_PTR_TO_UNINIT_MEM,  /* pointer to memory does not need to be initialized,
> -                                * helper function must fill all bytes or clear
> -                                * them in error case.
> -                                */
>
>         ARG_CONST_SIZE,         /* number of bytes accessed from memory */
>         ARG_CONST_SIZE_OR_ZERO, /* number of bytes accessed from memory or 0 */
> @@ -400,6 +397,12 @@ enum bpf_arg_type {
>         ARG_PTR_TO_SOCKET_OR_NULL       = PTR_MAYBE_NULL | ARG_PTR_TO_SOCKET,
>         ARG_PTR_TO_ALLOC_MEM_OR_NULL    = PTR_MAYBE_NULL | ARG_PTR_TO_ALLOC_MEM,
>         ARG_PTR_TO_STACK_OR_NULL        = PTR_MAYBE_NULL | ARG_PTR_TO_STACK,
> +       /* pointer to valid memory used to store a map value */
> +       ARG_PTR_TO_MAP_VALUE_UNINIT     = MEM_UNINIT | ARG_PTR_TO_MAP_VALUE,

seeing how this "alias" is used only in few places, I'd just use
`ARG_PTR_TO_MAP_VALUE | MEM_UNINIT` directly in prototype declaration
and the MEM_UNINIT flag directly in verifier logic.

> +       /* pointer to memory does not need to be initialized, helper function must fill
> +        * all bytes or clear them in error case.
> +        */
> +       ARG_PTR_TO_MEM_UNINIT           = MEM_UNINIT | ARG_PTR_TO_MEM,
>
>         /* This must be the last entry. Its purpose is to ensure the enum is
>          * wide enough to hold the higher bits reserved for bpf_type_flag.

[...]

> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index d175b70067b3..90280d5666be 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -5136,8 +5136,7 @@ static int process_timer_func(struct bpf_verifier_env *env, int regno,
>
>  static bool arg_type_is_mem_ptr(enum bpf_arg_type type)
>  {
> -       return base_type(type) == ARG_PTR_TO_MEM ||
> -              base_type(type) == ARG_PTR_TO_UNINIT_MEM;
> +       return base_type(type) == ARG_PTR_TO_MEM;
>  }

Is this helper function even useful anymore? I'd just drop this
function altogether.

>
>  static bool arg_type_is_mem_size(enum bpf_arg_type type)
> @@ -5273,7 +5272,6 @@ static const struct bpf_reg_types timer_types = { .types = { PTR_TO_MAP_VALUE }
>  static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
>         [ARG_PTR_TO_MAP_KEY]            = &map_key_value_types,
>         [ARG_PTR_TO_MAP_VALUE]          = &map_key_value_types,
> -       [ARG_PTR_TO_UNINIT_MAP_VALUE]   = &map_key_value_types,
>         [ARG_CONST_SIZE]                = &scalar_types,
>         [ARG_CONST_SIZE_OR_ZERO]        = &scalar_types,
>         [ARG_CONST_ALLOC_SIZE_OR_ZERO]  = &scalar_types,
> @@ -5287,7 +5285,6 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
>         [ARG_PTR_TO_BTF_ID]             = &btf_ptr_types,
>         [ARG_PTR_TO_SPIN_LOCK]          = &spin_lock_types,
>         [ARG_PTR_TO_MEM]                = &mem_types,
> -       [ARG_PTR_TO_UNINIT_MEM]         = &mem_types,
>         [ARG_PTR_TO_ALLOC_MEM]          = &alloc_mem_types,
>         [ARG_PTR_TO_INT]                = &int_ptr_types,
>         [ARG_PTR_TO_LONG]               = &int_ptr_types,
> @@ -5451,8 +5448,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
>                 return -EACCES;
>         }
>
> -       if (base_type(arg_type) == ARG_PTR_TO_MAP_VALUE ||
> -           base_type(arg_type) == ARG_PTR_TO_UNINIT_MAP_VALUE) {
> +       if (base_type(arg_type) == ARG_PTR_TO_MAP_VALUE) {
>                 err = resolve_map_arg_type(env, meta, &arg_type);
>                 if (err)
>                         return err;
> @@ -5528,8 +5524,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
>                 err = check_helper_mem_access(env, regno,
>                                               meta->map_ptr->key_size, false,
>                                               NULL);
> -       } else if (base_type(arg_type) == ARG_PTR_TO_MAP_VALUE ||
> -                  base_type(arg_type) == ARG_PTR_TO_UNINIT_MAP_VALUE) {
> +       } else if (base_type(arg_type) == ARG_PTR_TO_MAP_VALUE) {
>                 if (type_may_be_null(arg_type) && register_is_null(reg))
>                         return 0;
>
> @@ -5541,7 +5536,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
>                         verbose(env, "invalid map_ptr to access map->value\n");
>                         return -EACCES;
>                 }
> -               meta->raw_mode = (arg_type == ARG_PTR_TO_UNINIT_MAP_VALUE);
> +               meta->raw_mode = (arg_type == ARG_PTR_TO_MAP_VALUE_UNINIT);
>                 err = check_helper_mem_access(env, regno,
>                                               meta->map_ptr->value_size, false,
>                                               meta);
> @@ -5572,7 +5567,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
>                 /* The access to this pointer is only checked when we hit the
>                  * next is_mem_size argument below.
>                  */
> -               meta->raw_mode = (arg_type == ARG_PTR_TO_UNINIT_MEM);
> +               meta->raw_mode = (arg_type == ARG_PTR_TO_MEM_UNINIT);

aside: raw_mode is a horrible name that communicates literally nothing
towards its semantics (IMO), would be nice to fix that, I'm always
confused by it

>         } else if (arg_type_is_mem_size(arg_type)) {
>                 bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
>

[...]

> @@ -1406,7 +1406,7 @@ static const struct bpf_func_proto bpf_get_stack_proto_tp = {
>         .gpl_only       = true,
>         .ret_type       = RET_INTEGER,
>         .arg1_type      = ARG_PTR_TO_CTX,
> -       .arg2_type      = ARG_PTR_TO_UNINIT_MEM,
> +       .arg2_type      = ARG_PTR_TO_MEM_UNINIT,
>         .arg3_type      = ARG_CONST_SIZE_OR_ZERO,
>         .arg4_type      = ARG_ANYTHING,
>  };
> @@ -1473,7 +1473,7 @@ static const struct bpf_func_proto bpf_perf_prog_read_value_proto = {
>           .gpl_only       = true,
>           .ret_type       = RET_INTEGER,
>           .arg1_type      = ARG_PTR_TO_CTX,
> -         .arg2_type      = ARG_PTR_TO_UNINIT_MEM,
> +        .arg2_type      = ARG_PTR_TO_MEM_UNINIT,

indentation is off

>           .arg3_type      = ARG_CONST_SIZE,
>  };
>

[...]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v1 2/7] bpf: Add MEM_RELEASE as a bpf_type_flag
  2022-04-02  1:58 ` [PATCH bpf-next v1 2/7] bpf: Add MEM_RELEASE " Joanne Koong
  2022-04-04  7:34   ` Kumar Kartikeya Dwivedi
@ 2022-04-06 18:42   ` Andrii Nakryiko
  1 sibling, 0 replies; 32+ messages in thread
From: Andrii Nakryiko @ 2022-04-06 18:42 UTC (permalink / raw)
  To: Joanne Koong
  Cc: bpf, Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann, Joanne Koong

On Fri, Apr 1, 2022 at 7:00 PM Joanne Koong <joannekoong@fb.com> wrote:
>
> From: Joanne Koong <joannelkoong@gmail.com>
>
> Currently, we hardcode in the verifier which functions are release
> functions. We have no way of differentiating which argument is the one
> to be released (we assume it will always be the first argument).
>
> This patch adds MEM_RELEASE as a bpf_type_flag. This allows us to
> determine which argument in the function needs to be released, and
> removes having to hardcode a list of release functions into the
> verifier.
>
> Please note that currently, we only support one release argument in a
> helper function. In the future, if/when we need to support several
> release arguments within the function, MEM_RELEASE is necessary
> since there needs to be a way of differentiating which arguments are the
> release ones.
>
> In the near future, MEM_RELEASE will be used by dynptr helper functions
> such as bpf_free.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>  include/linux/bpf.h          |  4 +++-
>  include/linux/bpf_verifier.h |  3 +--
>  kernel/bpf/btf.c             |  3 ++-
>  kernel/bpf/ringbuf.c         |  4 ++--
>  kernel/bpf/verifier.c        | 42 ++++++++++++++++++------------------
>  net/core/filter.c            |  2 +-
>  6 files changed, 30 insertions(+), 28 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 6f2558da9d4a..cb9f42866cde 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -344,7 +344,9 @@ enum bpf_type_flag {
>
>         MEM_UNINIT              = BIT(5 + BPF_BASE_TYPE_BITS),
>
> -       __BPF_TYPE_LAST_FLAG    = MEM_UNINIT,
> +       MEM_RELEASE             = BIT(6 + BPF_BASE_TYPE_BITS),

"MEM_" part seems a bit too specific, it's not necessarily (just)
about memory, it's more generally about "releasing resources" in
general, right? ARG_RELEASE or OBJ_RELEASE maybe?

> +
> +       __BPF_TYPE_LAST_FLAG    = MEM_RELEASE,
>  };
>

[...]

> -/* Determine whether the function releases some resources allocated by another
> - * function call. The first reference type argument will be assumed to be
> - * released by release_reference().
> +/* Determine whether the type releases some resources allocated by a
> + * previous function call.
>   */
> -static bool is_release_function(enum bpf_func_id func_id)
> +static bool type_is_release_mem(u32 type)
>  {
> -       return func_id == BPF_FUNC_sk_release ||
> -              func_id == BPF_FUNC_ringbuf_submit ||
> -              func_id == BPF_FUNC_ringbuf_discard;
> +       return type & MEM_RELEASE;
>  }
>

same skepticism regarding the need for this helper function, just
makes grepping code slightly harder

>  static bool may_be_acquire_function(enum bpf_func_id func_id)

[...]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v1 3/7] bpf: Add bpf_dynptr_from_mem, bpf_malloc, bpf_free
  2022-04-02  1:58 ` [PATCH bpf-next v1 3/7] bpf: Add bpf_dynptr_from_mem, bpf_malloc, bpf_free Joanne Koong
@ 2022-04-06 22:23   ` Andrii Nakryiko
  2022-04-08 22:04     ` Joanne Koong
  0 siblings, 1 reply; 32+ messages in thread
From: Andrii Nakryiko @ 2022-04-06 22:23 UTC (permalink / raw)
  To: Joanne Koong
  Cc: bpf, Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann, Joanne Koong

On Fri, Apr 1, 2022 at 7:00 PM Joanne Koong <joannekoong@fb.com> wrote:
>
> From: Joanne Koong <joannelkoong@gmail.com>
>
> This patch adds 3 new APIs and the bulk of the verifier work for
> supporting dynamic pointers in bpf.
>
> There are different types of dynptrs. This patch starts with the most
> basic ones, ones that reference a program's local memory
> (eg a stack variable) and ones that reference memory that is dynamically
> allocated on behalf of the program. If the memory is dynamically
> allocated by the program, the program *must* free it before the program
> exits. This is enforced by the verifier.
>
> The added APIs are:
>
> long bpf_dynptr_from_mem(void *data, u32 size, struct bpf_dynptr *ptr);
> long bpf_malloc(u32 size, struct bpf_dynptr *ptr);
> void bpf_free(struct bpf_dynptr *ptr);
>
> This patch sets up the verifier to support dynptrs. Dynptrs will always
> reside on the program's stack frame. As such, their state is tracked
> in their corresponding stack slot, which includes the type of dynptr
> (DYNPTR_LOCAL vs. DYNPTR_MALLOC).
>
> When the program passes in an uninitialized dynptr (ARG_PTR_TO_DYNPTR |
> MEM_UNINIT), the stack slots corresponding to the frame pointer
> where the dynptr resides at is marked as STACK_DYNPTR. For helper functions
> that take in iniitalized dynptrs (such as the next patch in this series
> which supports dynptr reads/writes), the verifier enforces that the
> dynptr has been initialized by checking that their corresponding stack
> slots have been marked as STACK_DYNPTR. Dynptr release functions
> (eg bpf_free) will clear the stack slots. The verifier enforces at program
> exit that there are no dynptr stack slots that need to be released.
>
> There are other constraints that are enforced by the verifier as
> well, such as that the dynptr cannot be written to directly by the bpf
> program or by non-dynptr helper functions. The last patch in this series
> contains tests that trigger different cases that the verifier needs to
> successfully reject.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>  include/linux/bpf.h            |  74 ++++++++-
>  include/linux/bpf_verifier.h   |  18 +++
>  include/uapi/linux/bpf.h       |  40 +++++
>  kernel/bpf/helpers.c           |  88 +++++++++++
>  kernel/bpf/verifier.c          | 266 ++++++++++++++++++++++++++++++++-
>  scripts/bpf_doc.py             |   2 +
>  tools/include/uapi/linux/bpf.h |  40 +++++
>  7 files changed, 521 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index cb9f42866cde..e0fcff9f2aee 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -346,7 +346,13 @@ enum bpf_type_flag {
>
>         MEM_RELEASE             = BIT(6 + BPF_BASE_TYPE_BITS),
>
> -       __BPF_TYPE_LAST_FLAG    = MEM_RELEASE,
> +       /* DYNPTR points to a program's local memory (eg stack variable). */
> +       DYNPTR_TYPE_LOCAL       = BIT(7 + BPF_BASE_TYPE_BITS),
> +
> +       /* DYNPTR points to dynamically allocated memory. */
> +       DYNPTR_TYPE_MALLOC      = BIT(8 + BPF_BASE_TYPE_BITS),
> +
> +       __BPF_TYPE_LAST_FLAG    = DYNPTR_TYPE_MALLOC,
>  };
>
>  /* Max number of base types. */
> @@ -390,6 +396,7 @@ enum bpf_arg_type {
>         ARG_PTR_TO_STACK,       /* pointer to stack */
>         ARG_PTR_TO_CONST_STR,   /* pointer to a null terminated read-only string */
>         ARG_PTR_TO_TIMER,       /* pointer to bpf_timer */
> +       ARG_PTR_TO_DYNPTR,      /* pointer to bpf_dynptr. See bpf_type_flag for dynptr type */
>         __BPF_ARG_TYPE_MAX,
>
>         /* Extended arg_types. */
> @@ -2396,4 +2403,69 @@ int bpf_bprintf_prepare(char *fmt, u32 fmt_size, const u64 *raw_args,
>                         u32 **bin_buf, u32 num_args);
>  void bpf_bprintf_cleanup(void);
>
> +/* the implementation of the opaque uapi struct bpf_dynptr */
> +struct bpf_dynptr_kern {
> +       u8 *data;

nit: u8 * is too specific, it's not always "bytes" of data. Let's use `void *`?

> +       /* The upper 4 bits are reserved. Bit 29 denotes whether the
> +        * dynptr is read-only. Bits 30 - 32 denote the dynptr type.
> +        */

not essential, but I think using highest bit for read-only and then
however many next upper bits for dynptr kind is a bit cleaner
approach.

also it seems like normally bits are zero-indexed, so, pedantically,
there is no bit 32, it's bit #31

> +       u32 size;
> +       u32 offset;

Let's document the semantics of offset and size. E.g., if I have
offset 4 and size 20, does it mean there were 24 bytes, but we ignore
first 4 and can address next 20, or does it mean that there is 20
bytes, we skip first 4 and have 16 addressable. Basically, usable size
is just size of size - offset? That will change how/whether the size
is adjusted when offset is moved.

> +} __aligned(8);
> +
> +enum bpf_dynptr_type {

it's a good idea to have default zero value to be BPF_DYNPTR_TYPE_INVALID

> +       /* Local memory used by the bpf program (eg stack variable) */
> +       BPF_DYNPTR_TYPE_LOCAL,
> +       /* Memory allocated dynamically by the kernel for the dynptr */
> +       BPF_DYNPTR_TYPE_MALLOC,
> +};
> +
> +/* The upper 4 bits of dynptr->size are reserved. Consequently, the
> + * maximum supported size is 2^28 - 1.
> + */
> +#define DYNPTR_MAX_SIZE        ((1UL << 28) - 1)
> +#define DYNPTR_SIZE_MASK       0xFFFFFFF
> +#define DYNPTR_TYPE_SHIFT      29

I'm thinking that maybe we should start with reserving entire upper
byte in size and offset to be on the safer side? And if 16MB of
addressable memory blob isn't enough, we can always relaxed it later.
WDYT?

> +
> +static inline enum bpf_dynptr_type bpf_dynptr_get_type(struct bpf_dynptr_kern *ptr)
> +{
> +       return ptr->size >> DYNPTR_TYPE_SHIFT;
> +}
> +
> +static inline void bpf_dynptr_set_type(struct bpf_dynptr_kern *ptr, enum bpf_dynptr_type type)
> +{
> +       ptr->size |= type << DYNPTR_TYPE_SHIFT;
> +}
> +
> +static inline u32 bpf_dynptr_get_size(struct bpf_dynptr_kern *ptr)
> +{
> +       return ptr->size & DYNPTR_SIZE_MASK;
> +}
> +
> +static inline int bpf_dynptr_check_size(u32 size)
> +{
> +       if (size == 0)
> +               return -EINVAL;

What's the downside of allowing size 0? Honest question. I'm wondering
why prevent having dynptr pointing to an "empty slice"? It might be a
useful feature in practice.

> +
> +       if (size > DYNPTR_MAX_SIZE)
> +               return -E2BIG;
> +
> +       return 0;
> +}
> +
> +static inline int bpf_dynptr_check_off_len(struct bpf_dynptr_kern *ptr, u32 offset, u32 len)
> +{
> +       u32 capacity = bpf_dynptr_get_size(ptr) - ptr->offset;
> +
> +       if (len > capacity || offset > capacity - len)
> +               return -EINVAL;
> +
> +       return 0;
> +}
> +
> +void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data, enum bpf_dynptr_type type,
> +                    u32 offset, u32 size);
> +
> +void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
> +
>  #endif /* _LINUX_BPF_H */
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 7a01adc9e13f..bc0f105148f9 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -72,6 +72,18 @@ struct bpf_reg_state {
>
>                 u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */
>
> +               /* for dynptr stack slots */
> +               struct {
> +                       enum bpf_dynptr_type dynptr_type;
> +                       /* A dynptr is 16 bytes so it takes up 2 stack slots.
> +                        * We need to track which slot is the first slot
> +                        * to protect against cases where the user may try to
> +                        * pass in an address starting at the second slot of the
> +                        * dynptr.
> +                        */
> +                       bool dynptr_first_slot;
> +               };

why not

struct {
    enum bpf_dynptr_type type;
    bool first_lot;
} dynptr;

? I think it's cleaner grouping

> +
>                 /* Max size from any of the above. */
>                 struct {
>                         unsigned long raw1;
> @@ -174,9 +186,15 @@ enum bpf_stack_slot_type {
>         STACK_SPILL,      /* register spilled into stack */
>         STACK_MISC,       /* BPF program wrote some data into this slot */
>         STACK_ZERO,       /* BPF program wrote constant zero */
> +       /* A dynptr is stored in this stack slot. The type of dynptr
> +        * is stored in bpf_stack_state->spilled_ptr.type
> +        */
> +       STACK_DYNPTR,
>  };
>
>  #define BPF_REG_SIZE 8 /* size of eBPF register in bytes */
> +#define BPF_DYNPTR_SIZE 16 /* size of a struct bpf_dynptr in bytes */
> +#define BPF_DYNPTR_NR_SLOTS 2

#define BPF_DYNPTR_SIZE sizeof(struct bpf_dynptr_kern)
#define BPF_DYNPTR_NR_SLOTS BPF_DYNPTR_SIZE / BPF_REG_SIZE

?

>
>  struct bpf_stack_state {
>         struct bpf_reg_state spilled_ptr;
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index d14b10b85e51..6a57d8a1b882 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5143,6 +5143,38 @@ union bpf_attr {
>   *             The **hash_algo** is returned on success,
>   *             **-EOPNOTSUP** if the hash calculation failed or **-EINVAL** if
>   *             invalid arguments are passed.
> + *
> + * long bpf_dynptr_from_mem(void *data, u32 size, struct bpf_dynptr *ptr)
> + *     Description
> + *             Get a dynptr to local memory *data*.
> + *
> + *             For a dynptr to a dynamic memory allocation, please use bpf_malloc
> + *             instead.
> + *
> + *             The maximum *size* supported is DYNPTR_MAX_SIZE.
> + *     Return
> + *             0 on success or -EINVAL if the size is 0 or exceeds DYNPTR_MAX_SIZE.

Isn't it a -E2BIG for too big size?

> + *
> + * long bpf_malloc(u32 size, struct bpf_dynptr *ptr)

I think at least for bpf_malloc() we should add u64 flags argument for
future extensibility. Also API design-wise, while I get why *ptr is at
the end because it's a out parameter, it feels a bit unnatural to have
flags before the pointer itself. Maybe let's just have ptr as first
argument for all constructor APIs consistently, even though it's an
out parameter?

I'd also add flags to bpf_dynpt_from_mem() as well for extensibility.

> + *     Description
> + *             Dynamically allocate memory of *size* bytes.
> + *
> + *             Every call to bpf_malloc must have a corresponding
> + *             bpf_free, regardless of whether the bpf_malloc
> + *             succeeded.
> + *
> + *             The maximum *size* supported is DYNPTR_MAX_SIZE.
> + *     Return
> + *             0 on success, -ENOMEM if there is not enough memory for the
> + *             allocation, -EINVAL if the size is 0 or exceeds DYNPTR_MAX_SIZE.
> + *
> + * void bpf_free(struct bpf_dynptr *ptr)

thinking about the next patch set that will add storing this malloc
dynptr into the map, bpf_free() will be a lie, right? As it will only
decrement a refcnt, not necessarily free it, right? So maybe just
generic bpf_dynptr_put() or bpf_malloc_put() or something like that is
a bit more "truthful"?

> + *     Description
> + *             Free memory allocated by bpf_malloc.
> + *
> + *             After this operation, *ptr* will be an invalidated dynptr.
> + *     Return
> + *             Void.
>   */
>  #define __BPF_FUNC_MAPPER(FN)          \
>         FN(unspec),                     \
> @@ -5339,6 +5371,9 @@ union bpf_attr {
>         FN(copy_from_user_task),        \
>         FN(skb_set_tstamp),             \
>         FN(ima_file_hash),              \
> +       FN(dynptr_from_mem),            \
> +       FN(malloc),                     \
> +       FN(free),                       \
>         /* */
>
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> @@ -6486,6 +6521,11 @@ struct bpf_timer {
>         __u64 :64;
>  } __attribute__((aligned(8)));
>
> +struct bpf_dynptr {
> +       __u64 :64;
> +       __u64 :64;
> +} __attribute__((aligned(8)));
> +
>  struct bpf_sysctl {
>         __u32   write;          /* Sysctl is being read (= 0) or written (= 1).
>                                  * Allows 1,2,4-byte read, but no write.
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index cc6d480c5c23..ed5a7d9d0a18 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -1374,6 +1374,88 @@ void bpf_timer_cancel_and_free(void *val)
>         kfree(t);
>  }
>
> +void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data, enum bpf_dynptr_type type,
> +                    u32 offset, u32 size)
> +{
> +       ptr->data = data;
> +       ptr->offset = offset;
> +       ptr->size = size;
> +       bpf_dynptr_set_type(ptr, type);
> +}
> +
> +void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr)
> +{
> +       memset(ptr, 0, sizeof(*ptr));
> +}
> +
> +BPF_CALL_3(bpf_dynptr_from_mem, void *, data, u32, size, struct bpf_dynptr_kern *, ptr)
> +{
> +       int err;
> +
> +       err = bpf_dynptr_check_size(size);
> +       if (err) {
> +               bpf_dynptr_set_null(ptr);
> +               return err;
> +       }
> +
> +       bpf_dynptr_init(ptr, data, BPF_DYNPTR_TYPE_LOCAL, 0, size);
> +
> +       return 0;
> +}
> +
> +const struct bpf_func_proto bpf_dynptr_from_mem_proto = {
> +       .func           = bpf_dynptr_from_mem,
> +       .gpl_only       = false,
> +       .ret_type       = RET_INTEGER,
> +       .arg1_type      = ARG_PTR_TO_MEM,

need to think what to do with uninit stack slots. Do we need
bpf_dnptr_from_uninit_mem() or we just allow ARG_PTR_TO_MEM |
MEM_UNINIT here?

> +       .arg2_type      = ARG_CONST_SIZE_OR_ZERO,
> +       .arg3_type      = ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL | MEM_UNINIT,
> +};
> +
> +BPF_CALL_2(bpf_malloc, u32, size, struct bpf_dynptr_kern *, ptr)
> +{
> +       void *data;
> +       int err;
> +
> +       err = bpf_dynptr_check_size(size);
> +       if (err) {
> +               bpf_dynptr_set_null(ptr);
> +               return err;
> +       }
> +
> +       data = kmalloc(size, GFP_ATOMIC);

we have this fancy logic now to allow non-atomic allocation inside
sleepable programs, can we use that here as well? In sleepable mode it
would be nice to wait for malloc() to grab necessary memory, if
possible.

> +       if (!data) {
> +               bpf_dynptr_set_null(ptr);
> +               return -ENOMEM;
> +       }
> +

so.... kmalloc() doesn't zero initialize the memory. I think it's a
great property (which we can later modify with flags, if necessary),
so I'd do zero-initialization by default. we can keep calling it
bpf_malloc() instead of bpf_zalloc(), of course.


> +       bpf_dynptr_init(ptr, data, BPF_DYNPTR_TYPE_MALLOC, 0, size);
> +
> +       return 0;
> +}
> +
> +const struct bpf_func_proto bpf_malloc_proto = {
> +       .func           = bpf_malloc,
> +       .gpl_only       = false,
> +       .ret_type       = RET_INTEGER,
> +       .arg1_type      = ARG_ANYTHING,
> +       .arg2_type      = ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_MALLOC | MEM_UNINIT,
> +};
> +
> +BPF_CALL_1(bpf_free, struct bpf_dynptr_kern *, dynptr)
> +{
> +       kfree(dynptr->data);
> +       bpf_dynptr_set_null(dynptr);
> +       return 0;
> +}
> +
> +const struct bpf_func_proto bpf_free_proto = {
> +       .func           = bpf_free,
> +       .gpl_only       = false,
> +       .ret_type       = RET_VOID,
> +       .arg1_type      = ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_MALLOC | MEM_RELEASE,
> +};
> +
>  const struct bpf_func_proto bpf_get_current_task_proto __weak;
>  const struct bpf_func_proto bpf_get_current_task_btf_proto __weak;
>  const struct bpf_func_proto bpf_probe_read_user_proto __weak;
> @@ -1426,6 +1508,12 @@ bpf_base_func_proto(enum bpf_func_id func_id)
>                 return &bpf_loop_proto;
>         case BPF_FUNC_strncmp:
>                 return &bpf_strncmp_proto;
> +       case BPF_FUNC_dynptr_from_mem:
> +               return &bpf_dynptr_from_mem_proto;
> +       case BPF_FUNC_malloc:
> +               return &bpf_malloc_proto;
> +       case BPF_FUNC_free:
> +               return &bpf_free_proto;
>         default:
>                 break;
>         }
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 80e53303713e..cb3bcb54d4b4 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -479,6 +479,11 @@ static bool type_is_release_mem(u32 type)
>         return type & MEM_RELEASE;
>  }
>
> +static bool type_is_uninit_mem(u32 type)
> +{
> +       return type & MEM_UNINIT;
> +}
> +

ditto about the need for a helper

>  static bool may_be_acquire_function(enum bpf_func_id func_id)
>  {
>         return func_id == BPF_FUNC_sk_lookup_tcp ||
> @@ -583,6 +588,7 @@ static char slot_type_char[] = {
>         [STACK_SPILL]   = 'r',
>         [STACK_MISC]    = 'm',
>         [STACK_ZERO]    = '0',
> +       [STACK_DYNPTR]  = 'd',
>  };
>
>  static void print_liveness(struct bpf_verifier_env *env,
> @@ -598,6 +604,18 @@ static void print_liveness(struct bpf_verifier_env *env,
>                 verbose(env, "D");
>  }
>
> +static inline int get_spi(s32 off)
> +{
> +       return (-off - 1) / BPF_REG_SIZE;
> +}
> +
> +static bool check_spi_bounds(struct bpf_func_state *state, int spi, u32 nr_slots)

"check_xxx"/"validate_xxx" pattern has ambiguity when it comes to
interpreting its return value. In some cases it would be 0 for success
and <0 for error, in this it's true/false where probably true meaning
all good. It's unfortunate to have to think about this when reading
code. If you call it something like "is_stack_range_valid" it would be
much more natural to read and reason about, IMO.

BTW, what does "spi" stand for? "stack pointer index"? slot_idx?

> +{
> +       int allocated_slots = state->allocated_stack / BPF_REG_SIZE;
> +
> +       return allocated_slots > spi && nr_slots - 1 <= spi;

ok, this is personal preferences, but it took me considerable time to
try to understand what's being checked here (this backwards grow of
slot indices also threw me off). But seems like we have a range of
slots that are calculated as [spi - nr_slots + 1, spi] and we want to
check that it's within [0, allocated_stack), so most straightforward
way would be:

return spi - nr_slots + 1 >= 0 && spi < allocated_slots;

And I'd definitely leave a comment about this whole index grows
downwards (it's not immediately obvious even if you know that indices
are derived from negative stack offsets)

> +}
> +
>  static struct bpf_func_state *func(struct bpf_verifier_env *env,
>                                    const struct bpf_reg_state *reg)
>  {
> @@ -649,6 +667,133 @@ static void mark_verifier_state_scratched(struct bpf_verifier_env *env)
>         env->scratched_stack_slots = ~0ULL;
>  }
>
> +static int arg_to_dynptr_type(enum bpf_arg_type arg_type, enum bpf_dynptr_type *dynptr_type)
> +{
> +       int type = arg_type & (DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_MALLOC);

maybe let's define DYNPTR_TYPE_MASK that can be updated as we add new
types of dynptr?

> +
> +       switch (type) {
> +       case DYNPTR_TYPE_LOCAL:
> +               *dynptr_type = BPF_DYNPTR_TYPE_LOCAL;
> +               break;
> +       case DYNPTR_TYPE_MALLOC:
> +               *dynptr_type = BPF_DYNPTR_TYPE_MALLOC;
> +               break;
> +       default:
> +               /* Can't have more than one type set and can't have no
> +                * type set
> +                */
> +               return -EINVAL;

see above about BPF_DYNPTR_TYPE_INVALID, with that you don't have to
use out parameter, just return enum bpf_dynptr_type directly with
BPF_DYNPTR_TYPE_INVALID marking an error

> +       }
> +
> +       return 0;
> +}
> +
> +static bool dynptr_type_refcounted(struct bpf_func_state *state, int spi)

if you pass enum bpf_dynptr_type directly instead of spi this function
will be more generic and won't combine two separate functions
(fetching stack state and checking if dynptr is refcounter)

> +{
> +       enum bpf_dynptr_type type = state->stack[spi].spilled_ptr.dynptr_type;
> +
> +       return type == BPF_DYNPTR_TYPE_MALLOC;
> +}
> +
> +static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> +                                  enum bpf_arg_type arg_type)
> +{
> +       struct bpf_func_state *state = cur_func(env);
> +       enum bpf_dynptr_type type;
> +       int spi, i, err;
> +
> +       spi = get_spi(reg->off);
> +
> +       if (!check_spi_bounds(state, spi, BPF_DYNPTR_NR_SLOTS))
> +               return -EINVAL;
> +
> +       err = arg_to_dynptr_type(arg_type, &type);
> +       if (unlikely(err))

why unlikely()? don't micro optimize, let compiler do its job

> +               return err;
> +
> +       for (i = 0; i < BPF_REG_SIZE; i++) {
> +               state->stack[spi].slot_type[i] = STACK_DYNPTR;
> +               state->stack[spi - 1].slot_type[i] = STACK_DYNPTR;
> +       }
> +
> +       state->stack[spi].spilled_ptr.dynptr_type = type;
> +       state->stack[spi - 1].spilled_ptr.dynptr_type = type;
> +
> +       state->stack[spi].spilled_ptr.dynptr_first_slot = true;
> +
> +       return 0;
> +}
> +
> +static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> +{
> +       struct bpf_func_state *state = func(env, reg);
> +       int spi, i;
> +
> +       spi = get_spi(reg->off);
> +
> +       if (!check_spi_bounds(state, spi, BPF_DYNPTR_NR_SLOTS))
> +               return -EINVAL;
> +
> +       for (i = 0; i < BPF_REG_SIZE; i++) {
> +               state->stack[spi].slot_type[i] = STACK_INVALID;
> +               state->stack[spi - 1].slot_type[i] = STACK_INVALID;
> +       }
> +
> +       state->stack[spi].spilled_ptr.dynptr_type = 0;
> +       state->stack[spi].spilled_ptr.dynptr_first_slot = 0;
> +       state->stack[spi - 1].spilled_ptr.dynptr_type = 0;
> +
> +       return 0;
> +}
> +
> +/* Check if the dynptr argument is a proper initialized dynptr */
> +static bool check_dynptr_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> +                             enum bpf_arg_type arg_type)

is_dynptr_valid()? You are not checking if it's just initialized but
also that it matches arg_type, right? Also see my rambling about
check_xxx naming

> +{
> +       struct bpf_func_state *state = func(env, reg);
> +       enum bpf_dynptr_type expected_type;
> +       int spi, err;
> +
> +       /* Can't pass in a dynptr at a weird offset */
> +       if (reg->off % BPF_REG_SIZE)
> +               return false;
> +
> +       spi = get_spi(reg->off);
> +
> +       if (!check_spi_bounds(state, spi, BPF_DYNPTR_NR_SLOTS))
> +               return false;
> +
> +       if (!state->stack[spi].spilled_ptr.dynptr_first_slot)
> +               return false;
> +
> +       if (state->stack[spi].slot_type[0] != STACK_DYNPTR)
> +               return false;
> +
> +       /* ARG_PTR_TO_DYNPTR takes any type of dynptr */
> +       if (arg_type == ARG_PTR_TO_DYNPTR)
> +               return true;
> +
> +       err = arg_to_dynptr_type(arg_type, &expected_type);
> +       if (unlikely(err))
> +               return err;
> +
> +       return state->stack[spi].spilled_ptr.dynptr_type == expected_type;
> +}
> +
> +static bool stack_access_into_dynptr(struct bpf_func_state *state, int spi, int size)
> +{
> +       int nr_slots, i;
> +
> +       nr_slots = min(roundup(size, BPF_REG_SIZE) / BPF_REG_SIZE, spi + 1);
> +

this min(..., spi + 1) looks a bit like papering over access out of
stack... if it's checked somewhere else, we can just assume it's not
happening, if not, we should probably error out with a different
message (it's no about dynptr anymore)

> +       for (i = 0; i < nr_slots; i++) {
> +               if (state->stack[spi - i].slot_type[0] == STACK_DYNPTR)
> +                       return true;
> +       }
> +
> +       return false;
> +}
> +
>  /* The reg state of a pointer or a bounded scalar was saved when
>   * it was spilled to the stack.
>   */
> @@ -2885,6 +3030,12 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
>         }
>
>         mark_stack_slot_scratched(env, spi);
> +
> +       if (stack_access_into_dynptr(state, spi, size)) {
> +               verbose(env, "direct write into dynptr is not permitted\n");
> +               return -EINVAL;
> +       }
> +
>         if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) &&
>             !register_is_null(reg) && env->bpf_capable) {
>                 if (dst_reg != BPF_REG_FP) {
> @@ -3006,6 +3157,12 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env,
>                 slot = -i - 1;
>                 spi = slot / BPF_REG_SIZE;
>                 stype = &state->stack[spi].slot_type[slot % BPF_REG_SIZE];
> +
> +               if (*stype == STACK_DYNPTR) {
> +                       verbose(env, "direct write into dynptr is not permitted\n");
> +                       return -EINVAL;
> +               }
> +
>                 mark_stack_slot_scratched(env, spi);
>
>                 if (!env->allow_ptr_leaks
> @@ -5153,6 +5310,16 @@ static bool arg_type_is_int_ptr(enum bpf_arg_type type)
>                type == ARG_PTR_TO_LONG;
>  }
>
> +static inline bool arg_type_is_dynptr(enum bpf_arg_type type)
> +{
> +       return base_type(type) == ARG_PTR_TO_DYNPTR;
> +}
> +
> +static inline bool arg_type_is_dynptr_uninit(enum bpf_arg_type type)
> +{
> +       return arg_type_is_dynptr(type) && type & MEM_UNINIT;

please add ( ) around (type & MEM_UNINIT) to make clear operation
priority when combining with &&

> +}
> +
>  static int int_ptr_type_to_size(enum bpf_arg_type type)
>  {
>         if (type == ARG_PTR_TO_INT)
> @@ -5290,6 +5457,7 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
>         [ARG_PTR_TO_STACK]              = &stack_ptr_types,
>         [ARG_PTR_TO_CONST_STR]          = &const_str_ptr_types,
>         [ARG_PTR_TO_TIMER]              = &timer_types,
> +       [ARG_PTR_TO_DYNPTR]             = &stack_ptr_types,
>  };
>
>  static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
> @@ -5408,6 +5576,15 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>         return __check_ptr_off_reg(env, reg, regno, fixed_off_ok);
>  }
>
> +/*
> + * Determines whether the id used for reference tracking is held in a stack slot
> + * or in a register
> + */
> +static bool id_in_stack_slot(enum bpf_arg_type arg_type)

is_ or has_ is a good idea for such bool-returning helpers (similarly
for stack_access_into_dynptr above), otherwise it reads like a verb
and command to do something

but looking few lines below, if (arg_type_is_dynptr()) would be
clearer than extra wrapper function, not sure what's the purpose of
the helper

> +{
> +       return arg_type_is_dynptr(arg_type);
> +}
> +
>  static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
>                           struct bpf_call_arg_meta *meta,
>                           const struct bpf_func_proto *fn)
> @@ -5458,10 +5635,19 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
>                 return err;
>
>         arg_release = type_is_release_mem(arg_type);
> -       if (arg_release && !reg->ref_obj_id) {
> -               verbose(env, "R%d arg #%d is an unacquired reference and hence cannot be released\n",
> -                       regno, arg + 1);
> -               return -EINVAL;
> +       if (arg_release) {
> +               if (id_in_stack_slot(arg_type)) {
> +                       struct bpf_func_state *state = func(env, reg);
> +                       int spi = get_spi(reg->off);
> +
> +                       if (!state->stack[spi].spilled_ptr.id)
> +                               goto unacquired_ref_err;
> +               } else if (!reg->ref_obj_id)  {
> +unacquired_ref_err:

oh, this goto into the middle of else branch is nasty, is it such a
big deal to have this verbose() copied (or even tailored specifically
to dynptr)?

> +                       verbose(env, "R%d arg #%d is an unacquired reference and hence cannot be released\n",
> +                               regno, arg + 1);
> +                       return -EINVAL;
> +               }
>         }
>
>         err = check_func_arg_reg_off(env, reg, regno, arg_type, arg_release);
> @@ -5572,6 +5758,40 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
>                 bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
>
>                 err = check_mem_size_reg(env, reg, regno, zero_size_allowed, meta);
> +       } else if (arg_type_is_dynptr(arg_type)) {
> +               bool initialized = check_dynptr_init(env, reg, arg_type);
> +
> +               if (type_is_uninit_mem(arg_type)) {
> +                       if (initialized) {
> +                               verbose(env, "Arg #%d dynptr cannot be an initialized dynptr\n",
> +                                       arg + 1);
> +                               return -EINVAL;
> +                       }
> +                       meta->raw_mode = true;
> +                       err = check_helper_mem_access(env, regno, BPF_DYNPTR_SIZE, false, meta);
> +                       /* For now, we do not allow dynptrs to point to existing
> +                        * refcounted memory
> +                        */
> +                       if (reg_type_may_be_refcounted_or_null(regs[BPF_REG_1].type)) {

hard-coded BPF_REG_1?

> +                               verbose(env, "Arg #%d dynptr memory cannot be potentially refcounted\n",
> +                                       arg + 1);
> +                               return -EINVAL;
> +                       }
> +               } else {
> +                       if (!initialized) {
> +                               char *err_extra = "";

const char *

> +
> +                               if (arg_type & DYNPTR_TYPE_LOCAL)
> +                                       err_extra = "local ";
> +                               else if (arg_type & DYNPTR_TYPE_MALLOC)
> +                                       err_extra = "malloc ";
> +                               verbose(env, "Expected an initialized %sdynptr as arg #%d\n",
> +                                       err_extra, arg + 1);

what if helper accepts two or more different types of dynptr?

> +                               return -EINVAL;
> +                       }
> +                       if (type_is_release_mem(arg_type))
> +                               err = unmark_stack_slots_dynptr(env, reg);
> +               }
>         } else if (arg_type_is_alloc_size(arg_type)) {
>                 if (!tnum_is_const(reg->var_off)) {
>                         verbose(env, "R%d is not a known constant'\n",

[...]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v1 4/7] bpf: Add bpf_dynptr_read and bpf_dynptr_write
  2022-04-02  1:58 ` [PATCH bpf-next v1 4/7] bpf: Add bpf_dynptr_read and bpf_dynptr_write Joanne Koong
  2022-04-02 13:35   ` Toke Høiland-Jørgensen
@ 2022-04-06 22:32   ` Andrii Nakryiko
  2022-04-08 23:07     ` Joanne Koong
  1 sibling, 1 reply; 32+ messages in thread
From: Andrii Nakryiko @ 2022-04-06 22:32 UTC (permalink / raw)
  To: Joanne Koong
  Cc: bpf, Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann, Joanne Koong

On Fri, Apr 1, 2022 at 7:00 PM Joanne Koong <joannekoong@fb.com> wrote:
>
> From: Joanne Koong <joannelkoong@gmail.com>
>
> This patch adds two helper functions, bpf_dynptr_read and
> bpf_dynptr_write:
>
> long bpf_dynptr_read(void *dst, u32 len, struct bpf_dynptr *src, u32 offset);
>
> long bpf_dynptr_write(struct bpf_dynptr *dst, u32 offset, void *src, u32 len);
>
> The dynptr passed into these functions must be valid dynptrs that have
> been initialized.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>  include/linux/bpf.h            |  6 ++++
>  include/uapi/linux/bpf.h       | 18 +++++++++++
>  kernel/bpf/helpers.c           | 56 ++++++++++++++++++++++++++++++++++
>  tools/include/uapi/linux/bpf.h | 18 +++++++++++
>  4 files changed, 98 insertions(+)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index e0fcff9f2aee..cded9753fb7f 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -2426,6 +2426,12 @@ enum bpf_dynptr_type {
>  #define DYNPTR_MAX_SIZE        ((1UL << 28) - 1)
>  #define DYNPTR_SIZE_MASK       0xFFFFFFF
>  #define DYNPTR_TYPE_SHIFT      29
> +#define DYNPTR_RDONLY_BIT      BIT(28)
> +
> +static inline bool bpf_dynptr_is_rdonly(struct bpf_dynptr_kern *ptr)
> +{
> +       return ptr->size & DYNPTR_RDONLY_BIT;
> +}
>
>  static inline enum bpf_dynptr_type bpf_dynptr_get_type(struct bpf_dynptr_kern *ptr)
>  {
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 6a57d8a1b882..16a35e46be90 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5175,6 +5175,22 @@ union bpf_attr {
>   *             After this operation, *ptr* will be an invalidated dynptr.
>   *     Return
>   *             Void.
> + *
> + * long bpf_dynptr_read(void *dst, u32 len, struct bpf_dynptr *src, u32 offset)
> + *     Description
> + *             Read *len* bytes from *src* into *dst*, starting from *offset*
> + *             into *dst*.
> + *     Return
> + *             0 on success, -EINVAL if *offset* + *len* exceeds the length
> + *             of *src*'s data or if *src* is an invalid dynptr.
> + *
> + * long bpf_dynptr_write(struct bpf_dynptr *dst, u32 offset, void *src, u32 len)
> + *     Description
> + *             Write *len* bytes from *src* into *dst*, starting from *offset*
> + *             into *dst*.
> + *     Return
> + *             0 on success, -EINVAL if *offset* + *len* exceeds the length
> + *             of *dst*'s data or if *dst* is not writeable.

Did you plan to also add a helper to copy from one dynptr to another?
Something like

long bpf_dynptr_copy(struct bpf_dynptr *dst, struct bpf_dyn_ptr *src, u32 len) ?

Otherwise there won't be any way to copy memory from malloc'ed range
to ringbuf, for example, without doing intermediate copy. Not sure
what to do about extra offsets...

>   */
>  #define __BPF_FUNC_MAPPER(FN)          \
>         FN(unspec),                     \
> @@ -5374,6 +5390,8 @@ union bpf_attr {
>         FN(dynptr_from_mem),            \
>         FN(malloc),                     \
>         FN(free),                       \
> +       FN(dynptr_read),                \
> +       FN(dynptr_write),               \
>         /* */
>
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index ed5a7d9d0a18..7ec20e79928e 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -1412,6 +1412,58 @@ const struct bpf_func_proto bpf_dynptr_from_mem_proto = {
>         .arg3_type      = ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL | MEM_UNINIT,
>  };
>
> +BPF_CALL_4(bpf_dynptr_read, void *, dst, u32, len, struct bpf_dynptr_kern *, src, u32, offset)
> +{
> +       int err;
> +
> +       if (!src->data)
> +               return -EINVAL;
> +
> +       err = bpf_dynptr_check_off_len(src, offset, len);

you defined this function in patch #3, but didn't use it there. Let's
move the definition into this patch?

> +       if (err)
> +               return err;
> +
> +       memcpy(dst, src->data + src->offset + offset, len);
> +
> +       return 0;
> +}
> +

[...]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v1 6/7] bpf: Dynptr support for ring buffers
  2022-04-02  1:58 ` [PATCH bpf-next v1 6/7] bpf: Dynptr support for ring buffers Joanne Koong
  2022-04-02  6:40   ` kernel test robot
@ 2022-04-06 22:50   ` Andrii Nakryiko
  1 sibling, 0 replies; 32+ messages in thread
From: Andrii Nakryiko @ 2022-04-06 22:50 UTC (permalink / raw)
  To: Joanne Koong
  Cc: bpf, Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann, Joanne Koong

On Fri, Apr 1, 2022 at 7:00 PM Joanne Koong <joannekoong@fb.com> wrote:
>
> From: Joanne Koong <joannelkoong@gmail.com>
>
> Currently, our only way of writing dynamically-sized data into a ring
> buffer is through bpf_ringbuf_output but this incurs an extra memcpy
> cost. bpf_ringbuf_reserve + bpf_ringbuf_commit avoids this extra
> memcpy, but it can only safely support reservation sizes that are
> statically known since the verifier cannot guarantee that the bpf
> program won’t access memory outside the reserved space.
>
> The bpf_dynptr abstraction allows for dynamically-sized ring buffer
> reservations without the extra memcpy.
>
> There are 3 new APIs:
>
> long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr);
> void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags);
> void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags);
>
> These closely follow the functionalities of the original ringbuf APIs.
> For example, all ringbuffer dynptrs that have been reserved must be
> either submitted or discarded before the program exits.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>  include/linux/bpf.h            | 10 ++++-
>  include/uapi/linux/bpf.h       | 30 ++++++++++++++
>  kernel/bpf/helpers.c           |  6 +++
>  kernel/bpf/ringbuf.c           | 71 ++++++++++++++++++++++++++++++++++
>  kernel/bpf/verifier.c          | 17 ++++++--
>  tools/include/uapi/linux/bpf.h | 30 ++++++++++++++
>  6 files changed, 160 insertions(+), 4 deletions(-)
>

Looks great and is a very straightforward implementation, great job!
Please fix the warning and maybe expand a bit on "failure modes", but
otherwise:

Acked-by: Andrii Nakryiko <andrii@kernel.org>

>  /* The upper 4 bits of dynptr->size are reserved. Consequently, the
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index c835e437cb28..778de0b052c1 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5202,6 +5202,33 @@ union bpf_attr {
>   *             Pointer to the underlying dynptr data, NULL if the ptr is
>   *             read-only, if the dynptr is invalid, or if the offset and length
>   *             is out of bounds.
> + *
> + * long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr)

looking at this, out param dynptr at the end makes more sense again...
ok, I'm fine either way I guess :)

> + *     Description
> + *             Reserve *size* bytes of payload in a ring buffer *ringbuf*
> + *             through the dynptr interface. *flags* must be 0.
> + *     Return
> + *             0 on success, or a negative error in case of failure.

It would be good to mention that the verifier will enforce submit or
discard even when reservation fails. And submit_dnyptr/discard_dynptr
is a no-op for such failed/null dynptrs.

> + *
> + * void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags)
> + *     Description
> + *             Submit reserved ring buffer sample, pointed to by *data*,
> + *             through the dynptr interface.
> + *
> + *             For more information on *flags*, please see
> + *             'bpf_ringbuf_submit'.
> + *     Return
> + *             Nothing. Always succeeds.
> + *

[...]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v1 7/7] bpf: Dynptr tests
  2022-04-02  1:58 ` [PATCH bpf-next v1 7/7] bpf: Dynptr tests Joanne Koong
@ 2022-04-06 23:11   ` Andrii Nakryiko
  2022-04-08 23:16     ` Joanne Koong
  0 siblings, 1 reply; 32+ messages in thread
From: Andrii Nakryiko @ 2022-04-06 23:11 UTC (permalink / raw)
  To: Joanne Koong
  Cc: bpf, Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann, Joanne Koong

On Fri, Apr 1, 2022 at 7:00 PM Joanne Koong <joannekoong@fb.com> wrote:
>
> From: Joanne Koong <joannelkoong@gmail.com>
>
> This patch adds tests for dynptrs. These include scenarios that the
> verifier needs to reject, as well as some successful use cases of
> dynptrs that should pass.
>
> Some of the failure scenarios include checking against invalid bpf_frees,
> invalid writes, invalid reads, and invalid ringbuf API usages.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---

Great set of tests! Hard to keep reading 500+ lines of failing use
cases, but seems like a lot of interesting corner cases are handled!
Great job!

>  .../testing/selftests/bpf/prog_tests/dynptr.c | 303 ++++++++++
>  .../testing/selftests/bpf/progs/dynptr_fail.c | 527 ++++++++++++++++++
>  .../selftests/bpf/progs/dynptr_success.c      | 147 +++++
>  3 files changed, 977 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/dynptr.c
>  create mode 100644 tools/testing/selftests/bpf/progs/dynptr_fail.c
>  create mode 100644 tools/testing/selftests/bpf/progs/dynptr_success.c
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/dynptr.c b/tools/testing/selftests/bpf/prog_tests/dynptr.c
> new file mode 100644
> index 000000000000..7107ebee3427
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/dynptr.c
> @@ -0,0 +1,303 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2022 Facebook */
> +
> +#include <test_progs.h>
> +#include "dynptr_fail.skel.h"
> +#include "dynptr_success.skel.h"
> +
> +size_t log_buf_sz = 1024 * 1024;
> +
> +enum fail_case {
> +       MISSING_FREE,
> +       MISSING_FREE_CALLBACK,
> +       INVALID_FREE1,
> +       INVALID_FREE2,
> +       USE_AFTER_FREE,
> +       MALLOC_TWICE,
> +       INVALID_MAP_CALL1,
> +       INVALID_MAP_CALL2,
> +       RINGBUF_INVALID_ACCESS,
> +       RINGBUF_INVALID_API,
> +       RINGBUF_OUT_OF_BOUNDS,
> +       DATA_SLICE_OUT_OF_BOUNDS,
> +       DATA_SLICE_USE_AFTER_FREE,
> +       INVALID_HELPER1,
> +       INVALID_HELPER2,
> +       INVALID_WRITE1,
> +       INVALID_WRITE2,
> +       INVALID_WRITE3,
> +       INVALID_WRITE4,
> +       INVALID_READ1,
> +       INVALID_READ2,
> +       INVALID_READ3,
> +       INVALID_OFFSET,
> +       GLOBAL,
> +       FREE_TWICE,
> +       FREE_TWICE_CALLBACK,
> +};

it might make sense to just pass the program name as a string instead,
just like expected error message. This will allow more table-like
subtest specification (I'll expand below)

> +
> +static void verify_fail(enum fail_case fail, char *obj_log_buf,  char *err_msg)

nit: extra space

> +{
> +       LIBBPF_OPTS(bpf_object_open_opts, opts);
> +       struct bpf_program *prog;
> +       struct dynptr_fail *skel;
> +       int err;
> +
> +       opts.kernel_log_buf = obj_log_buf;
> +       opts.kernel_log_size = log_buf_sz;

see below, this could easily be just a static array variable, no need
to pass it in

> +       opts.kernel_log_level = 1;
> +
> +       skel = dynptr_fail__open_opts(&opts);
> +       if (!ASSERT_OK_PTR(skel, "skel_open"))
> +               return;
> +
> +       bpf_object__for_each_program(prog, skel->obj)
> +               bpf_program__set_autoload(prog, false);
> +
> +       /* these programs should all be rejected by the verifier */
> +       switch (fail) {
> +       case MISSING_FREE:
> +               prog = skel->progs.missing_free;
> +               break;
> +       case MISSING_FREE_CALLBACK:
> +               prog = skel->progs.missing_free_callback;
> +               break;

[...]

> +               break;
> +       case GLOBAL:
> +               prog = skel->progs.global;
> +               break;
> +       case FREE_TWICE:
> +               prog = skel->progs.free_twice;
> +               break;
> +       case FREE_TWICE_CALLBACK:
> +               prog = skel->progs.free_twice_callback;
> +               break;
> +       default:
> +               fprintf(stderr, "unknown fail_case\n");
> +               return;
> +       }

so instead of maintaining this enum definition and corresponding
mapping to prog, you can just specify program name as a string and use
bpf_object__find_program_by_name(). The only downside is that if you
make a typo in program name or it is renamed, you'll catch it at
runtime not at compilation time. But I think it's acceptable tradeoff.

> +
> +       bpf_program__set_autoload(prog, true);
> +
> +       err = dynptr_fail__load(skel);
> +
> +       ASSERT_OK_PTR(strstr(obj_log_buf, err_msg), "err_msg not found");

let's also print out the full log if something goes wrong. It will be
hard to debug when something (even the message itself) changes on
verifier side.

> +
> +       ASSERT_ERR(err, "unexpected load success");

nit: move this before log_buf check? seems like it's logically first
check you need to do

> +
> +       dynptr_fail__destroy(skel);
> +}
> +
> +static void run_prog(struct dynptr_success *skel, struct bpf_program *prog)
> +{
> +       struct bpf_link *link;
> +
> +       link = bpf_program__attach(prog);
> +       if (!ASSERT_OK_PTR(link, "bpf program attach"))

or ASSERT_xxx() macros this second string argument is something like
entity/variable name, it's not a message. See how ASSERT_xxx() uses it
internally. So keeping it short and identifier-like makes it easier to
follow actual failure messages.

> +               return;
> +
> +       usleep(1);
> +
> +       ASSERT_EQ(skel->bss->err, 0, "err");
> +
> +       bpf_link__destroy(link);
> +}
> +
> +static void verify_success(void)
> +{
> +       struct dynptr_success *skel;
> +
> +       skel = dynptr_success__open();
> +
> +       skel->bss->pid = getpid();
> +
> +       dynptr_success__load(skel);
> +       if (!ASSERT_OK_PTR(skel, "dynptr__open_and_load"))
> +               return;
> +
> +       run_prog(skel, skel->progs.prog_success);
> +       run_prog(skel, skel->progs.prog_success_data_slice);
> +       run_prog(skel, skel->progs.prog_success_ringbuf);

let's keep it generic as well as for negative tests and pass program
name? it will be easier to extend such framework

> +
> +       dynptr_success__destroy(skel);
> +}
> +
> +void test_dynptr(void)
> +{
> +       char *obj_log_buf;
> +
> +       obj_log_buf = malloc(3 * log_buf_sz);
> +       if (!ASSERT_OK_PTR(obj_log_buf, "obj_log_buf"))
> +               return;
> +       obj_log_buf[0] = '\0';

I'd keep it simple and just have a global static log buf of necessary
size. Less parameters to pass a well

> +
> +       if (test__start_subtest("missing_free"))
> +               verify_fail(MISSING_FREE, obj_log_buf,
> +                           "spi=0 is an unreleased dynptr");
> +

[...]

> +       if (test__start_subtest("free_twice_callback"))
> +               verify_fail(FREE_TWICE_CALLBACK, obj_log_buf,
> +                           "arg #1 is an unacquired reference and hence cannot be released");
> +
> +       if (test__start_subtest("success"))
> +               verify_success();

so instead of manually coded set of tests, it's more "scalable" to go
with table-driven approach. Something like

struct {
    const char *prog_name;
    const char *exp_msg;
} tests = {
  {"invalid_read2", "Expected an initialized dynptr as arg #3"},
  {"prog_success_ringbuf", NULL /* success case */},
  ...
};

then you can just succinctly:

for (i = 0; i < ARRAY_SIZE(tests); i++) {
  if (!test__start_subtest(tests[i].prog_name))
    continue;

  if (tests[i].exp_msg)
    verify_fail(tests[i].prog_name, tests[i].exp_msg);
  else
    verify_success(tests[i].prog_name);
}

Then adding new cases would be only adding BPF code and adding a
single line in the tests table.

> +
> +       free(obj_log_buf);
> +}

[...]

> +/* Can't call non-dynptr ringbuf APIs on a dynptr ringbuf sample */
> +SEC("raw_tp/sys_nanosleep")
> +int ringbuf_invalid_api(void *ctx)
> +{
> +       struct bpf_dynptr ptr;
> +       struct sample *sample;
> +
> +       err = bpf_ringbuf_reserve_dynptr(&ringbuf, sizeof(*sample), 0, &ptr);
> +       sample = bpf_dynptr_data(&ptr, 0, sizeof(*sample));
> +       if (!sample)
> +               goto done;
> +
> +       sample->pid = 123;
> +
> +       /* invalid API use. need to use dynptr API to submit/discard */
> +       bpf_ringbuf_submit(sample, 0);

this will be rejected also due to missing discard_dynptr() in this
code path, right? But if you remove return 0 below and fall through
into done this will go away.

> +
> +       return 0;
> +
> +done:
> +       bpf_ringbuf_discard_dynptr(&ptr, 0);
> +       return 0;
> +}

[...]

> +/* A dynptr can't be passed into a helper function at a non-zero offset */
> +SEC("raw_tp/sys_nanosleep")
> +int invalid_helper2(void *ctx)
> +{
> +       struct bpf_dynptr ptr = {};
> +       char read_data[64] = {};
> +       __u64 x = 0;
> +
> +       bpf_dynptr_from_mem(&x, sizeof(x), &ptr);
> +
> +       /* this should fail */
> +       bpf_dynptr_read(read_data, sizeof(read_data), (void *)&ptr + 8, 0);
> +
> +       return 0;
> +}
> +
> +/* A data slice can't be accessed out of bounds */
> +SEC("fentry/" SYS_PREFIX "sys_nanosleep")

why switching to fentry here with this ugly SYS_PREFIX thingy?

> +int data_slice_out_of_bounds(void *ctx)
> +{
> +       struct bpf_dynptr ptr = {};
> +       void *data;
> +
> +       bpf_malloc(8, &ptr);
> +
> +       data = bpf_dynptr_data(&ptr, 0, 8);
> +       if (!data)
> +               goto done;
> +
> +       /* can't index out of bounds of the data slice */
> +       val = *((char *)data + 8);
> +
> +done:
> +       bpf_free(&ptr);
> +       return 0;
> +}
> +

[...]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v1 0/7] Dynamic pointers
  2022-04-02  1:58 [PATCH bpf-next v1 0/7] Dynamic pointers Joanne Koong
                   ` (6 preceding siblings ...)
  2022-04-02  1:58 ` [PATCH bpf-next v1 7/7] bpf: Dynptr tests Joanne Koong
@ 2022-04-06 23:13 ` Andrii Nakryiko
  2022-04-07 12:44   ` Brendan Jackman
  7 siblings, 1 reply; 32+ messages in thread
From: Andrii Nakryiko @ 2022-04-06 23:13 UTC (permalink / raw)
  To: Joanne Koong, KP Singh, Florent Revest, Brendan Jackman
  Cc: bpf, Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann, Joanne Koong

On Fri, Apr 1, 2022 at 6:59 PM Joanne Koong <joannekoong@fb.com> wrote:
>
> From: Joanne Koong <joannelkoong@gmail.com>
>
> This patchset implements the basics of dynamic pointers in bpf.
>
> A dynamic pointer (struct bpf_dynptr) is a pointer that stores extra metadata
> alongside the address it points to. This abstraction is useful in bpf, given
> that every memory access in a bpf program must be safe. The verifier and bpf
> helper functions can use the metadata to enforce safety guarantees for things
> such as dynamically sized strings and kernel heap allocations.
>
> From the program side, the bpf_dynptr is an opaque struct and the verifier
> will enforce that its contents are never written to by the program.
> It can only be written to through specific bpf helper functions.
>
> There are several uses cases for dynamic pointers in bpf programs. A list of
> some are: dynamically sized ringbuf reservations without any extra memcpys,
> dynamic string parsing and memory comparisons, dynamic memory allocations that
> can be persisted in a map, and dynamic parsing of sk_buff and xdp_md packet
> data.
>
> At a high-level, the patches are as follows:
> 1/7 - Adds MEM_UNINIT as a bpf_type_flag
> 2/7 - Adds MEM_RELEASE as a bpf_type_flag
> 3/7 - Adds bpf_dynptr_from_mem, bpf_malloc, and bpf_free
> 4/7 - Adds bpf_dynptr_read and bpf_dynptr_write
> 5/7 - Adds dynptr data slices (ptr to underlying dynptr memory)
> 6/7 - Adds dynptr support for ring buffers
> 7/7 - Tests to check that verifier rejects certain fail cases and passes
> certain success cases
>
> This is the first dynptr patchset in a larger series. The next series of
> patches will add persisting dynamic memory allocations in maps, parsing packet
> data through dynptrs, dynptrs to referenced objects, convenience helpers for
> using dynptrs as iterators, and more helper functions for interacting with
> strings and memory dynamically.
>
> Joanne Koong (7):
>   bpf: Add MEM_UNINIT as a bpf_type_flag
>   bpf: Add MEM_RELEASE as a bpf_type_flag
>   bpf: Add bpf_dynptr_from_mem, bpf_malloc, bpf_free
>   bpf: Add bpf_dynptr_read and bpf_dynptr_write
>   bpf: Add dynptr data slices
>   bpf: Dynptr support for ring buffers
>   bpf: Dynptr tests
>
>  include/linux/bpf.h                           | 107 +++-
>  include/linux/bpf_verifier.h                  |  23 +-
>  include/uapi/linux/bpf.h                      | 100 ++++
>  kernel/bpf/bpf_lsm.c                          |   4 +-
>  kernel/bpf/btf.c                              |   3 +-
>  kernel/bpf/cgroup.c                           |   4 +-
>  kernel/bpf/helpers.c                          | 190 ++++++-
>  kernel/bpf/ringbuf.c                          |  75 ++-
>  kernel/bpf/stackmap.c                         |   6 +-
>  kernel/bpf/verifier.c                         | 406 ++++++++++++--
>  kernel/trace/bpf_trace.c                      |  20 +-
>  net/core/filter.c                             |  28 +-
>  scripts/bpf_doc.py                            |   2 +
>  tools/include/uapi/linux/bpf.h                | 100 ++++
>  .../testing/selftests/bpf/prog_tests/dynptr.c | 303 ++++++++++
>  .../testing/selftests/bpf/progs/dynptr_fail.c | 527 ++++++++++++++++++
>  .../selftests/bpf/progs/dynptr_success.c      | 147 +++++
>  17 files changed, 1955 insertions(+), 90 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/dynptr.c
>  create mode 100644 tools/testing/selftests/bpf/progs/dynptr_fail.c
>  create mode 100644 tools/testing/selftests/bpf/progs/dynptr_success.c
>

KP, Florent, Brendan,

You always wanted a way to work with runtime-sized BPF ringbuf samples
without extra copies. This is the way we can finally do this with good
usability and simplicity. Please take a look and provide feedback.
Thanks!

> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v1 0/7] Dynamic pointers
  2022-04-06 23:13 ` [PATCH bpf-next v1 0/7] Dynamic pointers Andrii Nakryiko
@ 2022-04-07 12:44   ` Brendan Jackman
  2022-04-07 20:40     ` Joanne Koong
  0 siblings, 1 reply; 32+ messages in thread
From: Brendan Jackman @ 2022-04-07 12:44 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Joanne Koong, KP Singh, Florent Revest, bpf, Andrii Nakryiko,
	Alexei Starovoitov, Daniel Borkmann, Joanne Koong

On Thu, 7 Apr 2022 at 01:13, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> On Fri, Apr 1, 2022 at 6:59 PM Joanne Koong <joannekoong@fb.com> wrote:
> >
> > From: Joanne Koong <joannelkoong@gmail.com>
> KP, Florent, Brendan,
>
> You always wanted a way to work with runtime-sized BPF ringbuf samples
> without extra copies. This is the way we can finally do this with good
> usability and simplicity. Please take a look and provide feedback.
> Thanks!

Thanks folks, this looks very cool. Please excuse my ignorance, one
thing that isn't clear to me is does this work for user memory? Or
would we need bpf_copy_from_user_dynptr to avoid an extra copy?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v1 0/7] Dynamic pointers
  2022-04-07 12:44   ` Brendan Jackman
@ 2022-04-07 20:40     ` Joanne Koong
  2022-04-08 10:21       ` Brendan Jackman
  0 siblings, 1 reply; 32+ messages in thread
From: Joanne Koong @ 2022-04-07 20:40 UTC (permalink / raw)
  To: Brendan Jackman
  Cc: Andrii Nakryiko, Joanne Koong, KP Singh, Florent Revest, bpf,
	Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann

On Thu, Apr 7, 2022 at 5:44 AM Brendan Jackman <jackmanb@google.com> wrote:
>
> On Thu, 7 Apr 2022 at 01:13, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Fri, Apr 1, 2022 at 6:59 PM Joanne Koong <joannekoong@fb.com> wrote:
> > >
> > > From: Joanne Koong <joannelkoong@gmail.com>
> > KP, Florent, Brendan,
> >
> > You always wanted a way to work with runtime-sized BPF ringbuf samples
> > without extra copies. This is the way we can finally do this with good
> > usability and simplicity. Please take a look and provide feedback.
> > Thanks!
>
> Thanks folks, this looks very cool. Please excuse my ignorance, one
> thing that isn't clear to me is does this work for user memory? Or
> would we need bpf_copy_from_user_dynptr to avoid an extra copy?

Userspace programs will not be able to use or interact with dynptrs
directly. If there is data at a user-space address that needs to be
copied into the ringbuffer, the address can be passed to the bpf
program and then the bpf program can use a helper like
bpf_probe_read_user_dynptr (not added in this patchset but will be
part of a following one), which will read the contents at that user
address into the ringbuf dynptr.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v1 0/7] Dynamic pointers
  2022-04-07 20:40     ` Joanne Koong
@ 2022-04-08 10:21       ` Brendan Jackman
  0 siblings, 0 replies; 32+ messages in thread
From: Brendan Jackman @ 2022-04-08 10:21 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Andrii Nakryiko, Joanne Koong, KP Singh, Florent Revest, bpf,
	Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann

On Thu, 7 Apr 2022 at 22:40, Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Thu, Apr 7, 2022 at 5:44 AM Brendan Jackman <jackmanb@google.com> wrote:
> >
> > On Thu, 7 Apr 2022 at 01:13, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Fri, Apr 1, 2022 at 6:59 PM Joanne Koong <joannekoong@fb.com> wrote:
> > > >
> > > > From: Joanne Koong <joannelkoong@gmail.com>
> > > KP, Florent, Brendan,
> > >
> > > You always wanted a way to work with runtime-sized BPF ringbuf samples
> > > without extra copies. This is the way we can finally do this with good
> > > usability and simplicity. Please take a look and provide feedback.
> > > Thanks!
> >
> > Thanks folks, this looks very cool. Please excuse my ignorance, one
> > thing that isn't clear to me is does this work for user memory? Or
> > would we need bpf_copy_from_user_dynptr to avoid an extra copy?
>
> Userspace programs will not be able to use or interact with dynptrs
> directly. If there is data at a user-space address that needs to be
> copied into the ringbuffer, the address can be passed to the bpf
> program and then the bpf program can use a helper like
> bpf_probe_read_user_dynptr (not added in this patchset but will be
> part of a following one), which will read the contents at that user
> address into the ringbuf dynptr.

Ah yeah right, this is not for userspace programs just programs in the
kernel that need to read user memory; the specific case I'm thinking
of is reading the argv/env from the exec path.
bpf_probe_read_user_dynptr sounds for sure like it would solve that
case.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v1 3/7] bpf: Add bpf_dynptr_from_mem, bpf_malloc, bpf_free
  2022-04-06 22:23   ` Andrii Nakryiko
@ 2022-04-08 22:04     ` Joanne Koong
  2022-04-08 22:46       ` Andrii Nakryiko
  0 siblings, 1 reply; 32+ messages in thread
From: Joanne Koong @ 2022-04-08 22:04 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Joanne Koong, bpf, Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann

On Wed, Apr 6, 2022 at 3:23 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Fri, Apr 1, 2022 at 7:00 PM Joanne Koong <joannekoong@fb.com> wrote:
> >
> > From: Joanne Koong <joannelkoong@gmail.com>
> >
> > This patch adds 3 new APIs and the bulk of the verifier work for
> > supporting dynamic pointers in bpf.
> >
> > There are different types of dynptrs. This patch starts with the most
> > basic ones, ones that reference a program's local memory
> > (eg a stack variable) and ones that reference memory that is dynamically
> > allocated on behalf of the program. If the memory is dynamically
> > allocated by the program, the program *must* free it before the program
> > exits. This is enforced by the verifier.
> >
> > The added APIs are:
> >
> > long bpf_dynptr_from_mem(void *data, u32 size, struct bpf_dynptr *ptr);
> > long bpf_malloc(u32 size, struct bpf_dynptr *ptr);
> > void bpf_free(struct bpf_dynptr *ptr);
> >
> > This patch sets up the verifier to support dynptrs. Dynptrs will always
> > reside on the program's stack frame. As such, their state is tracked
> > in their corresponding stack slot, which includes the type of dynptr
> > (DYNPTR_LOCAL vs. DYNPTR_MALLOC).
> >
> > When the program passes in an uninitialized dynptr (ARG_PTR_TO_DYNPTR |
> > MEM_UNINIT), the stack slots corresponding to the frame pointer
> > where the dynptr resides at is marked as STACK_DYNPTR. For helper functions
> > that take in iniitalized dynptrs (such as the next patch in this series
> > which supports dynptr reads/writes), the verifier enforces that the
> > dynptr has been initialized by checking that their corresponding stack
> > slots have been marked as STACK_DYNPTR. Dynptr release functions
> > (eg bpf_free) will clear the stack slots. The verifier enforces at program
> > exit that there are no dynptr stack slots that need to be released.
> >
> > There are other constraints that are enforced by the verifier as
> > well, such as that the dynptr cannot be written to directly by the bpf
> > program or by non-dynptr helper functions. The last patch in this series
> > contains tests that trigger different cases that the verifier needs to
> > successfully reject.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
> >  include/linux/bpf.h            |  74 ++++++++-
> >  include/linux/bpf_verifier.h   |  18 +++
> >  include/uapi/linux/bpf.h       |  40 +++++
> >  kernel/bpf/helpers.c           |  88 +++++++++++
> >  kernel/bpf/verifier.c          | 266 ++++++++++++++++++++++++++++++++-
> >  scripts/bpf_doc.py             |   2 +
> >  tools/include/uapi/linux/bpf.h |  40 +++++
> >  7 files changed, 521 insertions(+), 7 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index cb9f42866cde..e0fcff9f2aee 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -346,7 +346,13 @@ enum bpf_type_flag {
> >
> >         MEM_RELEASE             = BIT(6 + BPF_BASE_TYPE_BITS),
> >
> > -       __BPF_TYPE_LAST_FLAG    = MEM_RELEASE,
> > +       /* DYNPTR points to a program's local memory (eg stack variable). */
> > +       DYNPTR_TYPE_LOCAL       = BIT(7 + BPF_BASE_TYPE_BITS),
> > +
> > +       /* DYNPTR points to dynamically allocated memory. */
> > +       DYNPTR_TYPE_MALLOC      = BIT(8 + BPF_BASE_TYPE_BITS),
> > +
> > +       __BPF_TYPE_LAST_FLAG    = DYNPTR_TYPE_MALLOC,
> >  };
> >
> >  /* Max number of base types. */
> > @@ -390,6 +396,7 @@ enum bpf_arg_type {
> >         ARG_PTR_TO_STACK,       /* pointer to stack */
> >         ARG_PTR_TO_CONST_STR,   /* pointer to a null terminated read-only string */
> >         ARG_PTR_TO_TIMER,       /* pointer to bpf_timer */
> > +       ARG_PTR_TO_DYNPTR,      /* pointer to bpf_dynptr. See bpf_type_flag for dynptr type */
> >         __BPF_ARG_TYPE_MAX,
> >
> >         /* Extended arg_types. */
> > @@ -2396,4 +2403,69 @@ int bpf_bprintf_prepare(char *fmt, u32 fmt_size, const u64 *raw_args,
> >                         u32 **bin_buf, u32 num_args);
> >  void bpf_bprintf_cleanup(void);
> >
> > +/* the implementation of the opaque uapi struct bpf_dynptr */
> > +struct bpf_dynptr_kern {
> > +       u8 *data;
>
> nit: u8 * is too specific, it's not always "bytes" of data. Let's use `void *`?
Sounds great! My reason for going with u8 * instead of void * is that
void pointer arithmetic in C is invalid - but it seems like this isn't
something we have to worry about here since gcc is the default
compiler for linux and gcc allows it as an extension
>
> > +       /* The upper 4 bits are reserved. Bit 29 denotes whether the
> > +        * dynptr is read-only. Bits 30 - 32 denote the dynptr type.
> > +        */
>
> not essential, but I think using highest bit for read-only and then
> however many next upper bits for dynptr kind is a bit cleaner
> approach.
I'm happy with either - I was thinking if we have the uppermost bits
be dynptr kind, then that makes it easiest to get the dynptr type
(simply size >> DYNPTR_TYPE_SHIFT) whereas if the read-only bit is the
highest bit, then we also need to clear that out. But not a big deal
:)
>
> also it seems like normally bits are zero-indexed, so, pedantically,
> there is no bit 32, it's bit #31
>
> > +       u32 size;
> > +       u32 offset;
>
> Let's document the semantics of offset and size. E.g., if I have
> offset 4 and size 20, does it mean there were 24 bytes, but we ignore
> first 4 and can address next 20, or does it mean that there is 20
> bytes, we skip first 4 and have 16 addressable. Basically, usable size
> is just size of size - offset? That will change how/whether the size
> is adjusted when offset is moved.
>
> > +} __aligned(8);
> > +
> > +enum bpf_dynptr_type {
>
> it's a good idea to have default zero value to be BPF_DYNPTR_TYPE_INVALID
>
> > +       /* Local memory used by the bpf program (eg stack variable) */
> > +       BPF_DYNPTR_TYPE_LOCAL,
> > +       /* Memory allocated dynamically by the kernel for the dynptr */
> > +       BPF_DYNPTR_TYPE_MALLOC,
> > +};
> > +
> > +/* The upper 4 bits of dynptr->size are reserved. Consequently, the
> > + * maximum supported size is 2^28 - 1.
> > + */
> > +#define DYNPTR_MAX_SIZE        ((1UL << 28) - 1)
> > +#define DYNPTR_SIZE_MASK       0xFFFFFFF
> > +#define DYNPTR_TYPE_SHIFT      29
>
> I'm thinking that maybe we should start with reserving entire upper
> byte in size and offset to be on the safer side? And if 16MB of
> addressable memory blob isn't enough, we can always relaxed it later.
> WDYT?
>
This sounds great! I will make this change for v2
> > +
> > +static inline enum bpf_dynptr_type bpf_dynptr_get_type(struct bpf_dynptr_kern *ptr)
> > +{
> > +       return ptr->size >> DYNPTR_TYPE_SHIFT;
> > +}
> > +
> > +static inline void bpf_dynptr_set_type(struct bpf_dynptr_kern *ptr, enum bpf_dynptr_type type)
> > +{
> > +       ptr->size |= type << DYNPTR_TYPE_SHIFT;
> > +}
> > +
> > +static inline u32 bpf_dynptr_get_size(struct bpf_dynptr_kern *ptr)
> > +{
> > +       return ptr->size & DYNPTR_SIZE_MASK;
> > +}
> > +
> > +static inline int bpf_dynptr_check_size(u32 size)
> > +{
> > +       if (size == 0)
> > +               return -EINVAL;
>
> What's the downside of allowing size 0? Honest question. I'm wondering
> why prevent having dynptr pointing to an "empty slice"? It might be a
> useful feature in practice.
I don't see the use of dynptrs that point to something of size 0, so I
thought it'd be simplest to just return an -EINVAL if the user tries
to create one. I don't have a particular preference for handling this
though - especially if this will be a useful feature in the future,
then I agree we should just let the user create and use empty slices
if they wish to.
>
> > +
> > +       if (size > DYNPTR_MAX_SIZE)
> > +               return -E2BIG;
> > +
> > +       return 0;
> > +}
> > +
> > +static inline int bpf_dynptr_check_off_len(struct bpf_dynptr_kern *ptr, u32 offset, u32 len)
> > +{
> > +       u32 capacity = bpf_dynptr_get_size(ptr) - ptr->offset;
> > +
> > +       if (len > capacity || offset > capacity - len)
> > +               return -EINVAL;
> > +
> > +       return 0;
> > +}
> > +
> > +void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data, enum bpf_dynptr_type type,
> > +                    u32 offset, u32 size);
> > +
> > +void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
> > +
> >  #endif /* _LINUX_BPF_H */
> > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > index 7a01adc9e13f..bc0f105148f9 100644
> > --- a/include/linux/bpf_verifier.h
> > +++ b/include/linux/bpf_verifier.h
> > @@ -72,6 +72,18 @@ struct bpf_reg_state {
> >
> >                 u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */
> >
> > +               /* for dynptr stack slots */
> > +               struct {
> > +                       enum bpf_dynptr_type dynptr_type;
> > +                       /* A dynptr is 16 bytes so it takes up 2 stack slots.
> > +                        * We need to track which slot is the first slot
> > +                        * to protect against cases where the user may try to
> > +                        * pass in an address starting at the second slot of the
> > +                        * dynptr.
> > +                        */
> > +                       bool dynptr_first_slot;
> > +               };
>
> why not
>
> struct {
>     enum bpf_dynptr_type type;
>     bool first_lot;
> } dynptr;
>
> ? I think it's cleaner grouping
Agreed! I will make this change for v2
>
[...]
>
> > + *     Description
> > + *             Dynamically allocate memory of *size* bytes.
> > + *
> > + *             Every call to bpf_malloc must have a corresponding
> > + *             bpf_free, regardless of whether the bpf_malloc
> > + *             succeeded.
> > + *
> > + *             The maximum *size* supported is DYNPTR_MAX_SIZE.
> > + *     Return
> > + *             0 on success, -ENOMEM if there is not enough memory for the
> > + *             allocation, -EINVAL if the size is 0 or exceeds DYNPTR_MAX_SIZE.
> > + *
> > + * void bpf_free(struct bpf_dynptr *ptr)
>
> thinking about the next patch set that will add storing this malloc
> dynptr into the map, bpf_free() will be a lie, right? As it will only
> decrement a refcnt, not necessarily free it, right? So maybe just
> generic bpf_dynptr_put() or bpf_malloc_put() or something like that is
> a bit more "truthful"?
I like the simplicity of bpf_free(), but I can see how that might be
confusing. What are your thoughts on "bpf_dynptr_free()"? Since when
we get into dynptrs that are stored in maps vs. dynptrs stored
locally, calling bpf_dynptr_free() frees (invalidates) your local
dynptr even if it doesn't free the underlying memory if it still has
valid refcounts on it? To me, "malloc" and "_free" go more intuitively
together as a pair.
>
> > + *     Description
> > + *             Free memory allocated by bpf_malloc.
> > + *
> > + *             After this operation, *ptr* will be an invalidated dynptr.
> > + *     Return
> > + *             Void.
> >   */
[...]
> > +const struct bpf_func_proto bpf_dynptr_from_mem_proto = {
> > +       .func           = bpf_dynptr_from_mem,
> > +       .gpl_only       = false,
> > +       .ret_type       = RET_INTEGER,
> > +       .arg1_type      = ARG_PTR_TO_MEM,
>
> need to think what to do with uninit stack slots. Do we need
> bpf_dnptr_from_uninit_mem() or we just allow ARG_PTR_TO_MEM |
> MEM_UNINIT here?
I think we can just change this to ARG_PTR_TO_MEM | MEM_UNINIT.
>
> > +       .arg2_type      = ARG_CONST_SIZE_OR_ZERO,
> > +       .arg3_type      = ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL | MEM_UNINIT,
> > +};
> > +
> > +BPF_CALL_2(bpf_malloc, u32, size, struct bpf_dynptr_kern *, ptr)
> > +{
> > +       void *data;
> > +       int err;
> > +
> > +       err = bpf_dynptr_check_size(size);
> > +       if (err) {
> > +               bpf_dynptr_set_null(ptr);
> > +               return err;
> > +       }
> > +
> > +       data = kmalloc(size, GFP_ATOMIC);
>
> we have this fancy logic now to allow non-atomic allocation inside
> sleepable programs, can we use that here as well? In sleepable mode it
> would be nice to wait for malloc() to grab necessary memory, if
> possible.
Agreed - I'm planning to do this in a later "dynptr optimizations"
patchset (which will also include inlining BPF instructions for some
of the helper functions)
>
> > +       if (!data) {
> > +               bpf_dynptr_set_null(ptr);
> > +               return -ENOMEM;
> > +       }
> > +
>
> so.... kmalloc() doesn't zero initialize the memory. I think it's a
> great property (which we can later modify with flags, if necessary),
> so I'd do zero-initialization by default. we can keep calling it
> bpf_malloc() instead of bpf_zalloc(), of course.
>
[...]
> > +static inline int get_spi(s32 off)
> > +{
> > +       return (-off - 1) / BPF_REG_SIZE;
> > +}
> > +
> > +static bool check_spi_bounds(struct bpf_func_state *state, int spi, u32 nr_slots)
>
> "check_xxx"/"validate_xxx" pattern has ambiguity when it comes to
> interpreting its return value. In some cases it would be 0 for success
> and <0 for error, in this it's true/false where probably true meaning
> all good. It's unfortunate to have to think about this when reading
> code. If you call it something like "is_stack_range_valid" it would be
> much more natural to read and reason about, IMO.
Great point! I'll change this naming for v2
>
> BTW, what does "spi" stand for? "stack pointer index"? slot_idx?
It's not formally documented anywhere but I assume it's short for
"stack pointer index".
>
> > +{
> > +       int allocated_slots = state->allocated_stack / BPF_REG_SIZE;
> > +
> > +       return allocated_slots > spi && nr_slots - 1 <= spi;
>
> ok, this is personal preferences, but it took me considerable time to
> try to understand what's being checked here (this backwards grow of
> slot indices also threw me off). But seems like we have a range of
> slots that are calculated as [spi - nr_slots + 1, spi] and we want to
> check that it's within [0, allocated_stack), so most straightforward
> way would be:
>
> return spi - nr_slots + 1 >= 0 && spi < allocated_slots;
>
> And I'd definitely leave a comment about this whole index grows
> downwards (it's not immediately obvious even if you know that indices
> are derived from negative stack offsets)
Awesome, I will make these edits for v2
>
[...]
> > +       switch (type) {
> > +       case DYNPTR_TYPE_LOCAL:
> > +               *dynptr_type = BPF_DYNPTR_TYPE_LOCAL;
> > +               break;
> > +       case DYNPTR_TYPE_MALLOC:
> > +               *dynptr_type = BPF_DYNPTR_TYPE_MALLOC;
> > +               break;
> > +       default:
> > +               /* Can't have more than one type set and can't have no
> > +                * type set
> > +                */
> > +               return -EINVAL;
>
> see above about BPF_DYNPTR_TYPE_INVALID, with that you don't have to
> use out parameter, just return enum bpf_dynptr_type directly with
> BPF_DYNPTR_TYPE_INVALID marking an error
Nice! I love this suggestion - it makes this a lot smoother.
>
> > +       }
> > +
> > +       return 0;
> > +}
> > +
[...]
> > +
> > +/* Check if the dynptr argument is a proper initialized dynptr */
> > +static bool check_dynptr_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> > +                             enum bpf_arg_type arg_type)
>
> is_dynptr_valid()? You are not checking if it's just initialized but
> also that it matches arg_type, right? Also see my rambling about
> check_xxx naming
I will rename this to is_dynptr_init_valid(). is_dynptr_valid() might
be too generic - for example, a valid dynptr should also have its
stack slots marked accordingly, which isn't true here since this
dynptr is uninitialized. I think is_dynptr_init_valid() will be
clearest
>
> > +{
> > +       struct bpf_func_state *state = func(env, reg);
> > +       enum bpf_dynptr_type expected_type;
> > +       int spi, err;
> > +
> > +       /* Can't pass in a dynptr at a weird offset */
> > +       if (reg->off % BPF_REG_SIZE)
> > +               return false;
> > +
> > +       spi = get_spi(reg->off);
> > +
> > +       if (!check_spi_bounds(state, spi, BPF_DYNPTR_NR_SLOTS))
> > +               return false;
> > +
> > +       if (!state->stack[spi].spilled_ptr.dynptr_first_slot)
> > +               return false;
> > +
> > +       if (state->stack[spi].slot_type[0] != STACK_DYNPTR)
> > +               return false;
> > +
> > +       /* ARG_PTR_TO_DYNPTR takes any type of dynptr */
> > +       if (arg_type == ARG_PTR_TO_DYNPTR)
> > +               return true;
> > +
> > +       err = arg_to_dynptr_type(arg_type, &expected_type);
> > +       if (unlikely(err))
> > +               return err;
> > +
> > +       return state->stack[spi].spilled_ptr.dynptr_type == expected_type;
> > +}
[...]
> > +/*
> > + * Determines whether the id used for reference tracking is held in a stack slot
> > + * or in a register
> > + */
> > +static bool id_in_stack_slot(enum bpf_arg_type arg_type)
>
> is_ or has_ is a good idea for such bool-returning helpers (similarly
> for stack_access_into_dynptr above), otherwise it reads like a verb
> and command to do something
>
> but looking few lines below, if (arg_type_is_dynptr()) would be
> clearer than extra wrapper function, not sure what's the purpose of
> the helper
My thinking behind this extra wrapper function was that it'd be more
extensible in the future if there are other types that will store
their id in the stack slot. But I think I'm over-optimizing here :)
I'll remove this wrapper function
>
[...]
> > @@ -5572,6 +5758,40 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
> >                 bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
> >
> >                 err = check_mem_size_reg(env, reg, regno, zero_size_allowed, meta);
> > +       } else if (arg_type_is_dynptr(arg_type)) {
> > +               bool initialized = check_dynptr_init(env, reg, arg_type);
> > +
> > +               if (type_is_uninit_mem(arg_type)) {
> > +                       if (initialized) {
> > +                               verbose(env, "Arg #%d dynptr cannot be an initialized dynptr\n",
> > +                                       arg + 1);
> > +                               return -EINVAL;
> > +                       }
> > +                       meta->raw_mode = true;
> > +                       err = check_helper_mem_access(env, regno, BPF_DYNPTR_SIZE, false, meta);
> > +                       /* For now, we do not allow dynptrs to point to existing
> > +                        * refcounted memory
> > +                        */
> > +                       if (reg_type_may_be_refcounted_or_null(regs[BPF_REG_1].type)) {
>
> hard-coded BPF_REG_1?

I'm viewing this as a temporary line because one of the patches in a
later dynptr patchset will enable support for local dynptrs to point
to existing refcounted memory. The alternative is to add a new
bpf_type_flag like NO_REFCOUNT and then remove that flag later. What
are your thoughts?
>
> > +                               verbose(env, "Arg #%d dynptr memory cannot be potentially refcounted\n",
> > +                                       arg + 1);
> > +                               return -EINVAL;
> > +                       }
> > +               } else {
> > +                       if (!initialized) {
> > +                               char *err_extra = "";
>
> const char *
>
> > +
> > +                               if (arg_type & DYNPTR_TYPE_LOCAL)
> > +                                       err_extra = "local ";
> > +                               else if (arg_type & DYNPTR_TYPE_MALLOC)
> > +                                       err_extra = "malloc ";
> > +                               verbose(env, "Expected an initialized %sdynptr as arg #%d\n",
> > +                                       err_extra, arg + 1);
>
> what if helper accepts two or more different types of dynptr?
Currently, bpf_dynptr_read/write accept any type of dynptr so they
don't set any dynptr type flag, which means this error would just
print "Expected an initialized dynptr as arg...". But you're right
that in the future, there can be some API that accepts only a subset
(eg mallocs and ringbuffers and not local dynptrs); in this case,
maybe the simplest is just to return a generic "Expected an
initialized dynptr as arg...". Do you think this suffices or do you
think it'd be worth the effort to print out the different types of
initialized dynptrs it expects?
>
> > +                               return -EINVAL;
> > +                       }
> > +                       if (type_is_release_mem(arg_type))
> > +                               err = unmark_stack_slots_dynptr(env, reg);
> > +               }
> >         } else if (arg_type_is_alloc_size(arg_type)) {
> >                 if (!tnum_is_const(reg->var_off)) {
> >                         verbose(env, "R%d is not a known constant'\n",
>
> [...]
Thanks for your feedback, Andrii!!

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v1 3/7] bpf: Add bpf_dynptr_from_mem, bpf_malloc, bpf_free
  2022-04-08 22:04     ` Joanne Koong
@ 2022-04-08 22:46       ` Andrii Nakryiko
  2022-04-08 23:37         ` Joanne Koong
  0 siblings, 1 reply; 32+ messages in thread
From: Andrii Nakryiko @ 2022-04-08 22:46 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Joanne Koong, bpf, Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann

On Fri, Apr 8, 2022 at 3:04 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Wed, Apr 6, 2022 at 3:23 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Fri, Apr 1, 2022 at 7:00 PM Joanne Koong <joannekoong@fb.com> wrote:
> > >
> > > From: Joanne Koong <joannelkoong@gmail.com>
> > >
> > > This patch adds 3 new APIs and the bulk of the verifier work for
> > > supporting dynamic pointers in bpf.
> > >
> > > There are different types of dynptrs. This patch starts with the most
> > > basic ones, ones that reference a program's local memory
> > > (eg a stack variable) and ones that reference memory that is dynamically
> > > allocated on behalf of the program. If the memory is dynamically
> > > allocated by the program, the program *must* free it before the program
> > > exits. This is enforced by the verifier.
> > >
> > > The added APIs are:
> > >
> > > long bpf_dynptr_from_mem(void *data, u32 size, struct bpf_dynptr *ptr);
> > > long bpf_malloc(u32 size, struct bpf_dynptr *ptr);
> > > void bpf_free(struct bpf_dynptr *ptr);
> > >
> > > This patch sets up the verifier to support dynptrs. Dynptrs will always
> > > reside on the program's stack frame. As such, their state is tracked
> > > in their corresponding stack slot, which includes the type of dynptr
> > > (DYNPTR_LOCAL vs. DYNPTR_MALLOC).
> > >
> > > When the program passes in an uninitialized dynptr (ARG_PTR_TO_DYNPTR |
> > > MEM_UNINIT), the stack slots corresponding to the frame pointer
> > > where the dynptr resides at is marked as STACK_DYNPTR. For helper functions
> > > that take in iniitalized dynptrs (such as the next patch in this series
> > > which supports dynptr reads/writes), the verifier enforces that the
> > > dynptr has been initialized by checking that their corresponding stack
> > > slots have been marked as STACK_DYNPTR. Dynptr release functions
> > > (eg bpf_free) will clear the stack slots. The verifier enforces at program
> > > exit that there are no dynptr stack slots that need to be released.
> > >
> > > There are other constraints that are enforced by the verifier as
> > > well, such as that the dynptr cannot be written to directly by the bpf
> > > program or by non-dynptr helper functions. The last patch in this series
> > > contains tests that trigger different cases that the verifier needs to
> > > successfully reject.
> > >
> > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > > ---
> > >  include/linux/bpf.h            |  74 ++++++++-
> > >  include/linux/bpf_verifier.h   |  18 +++
> > >  include/uapi/linux/bpf.h       |  40 +++++
> > >  kernel/bpf/helpers.c           |  88 +++++++++++
> > >  kernel/bpf/verifier.c          | 266 ++++++++++++++++++++++++++++++++-
> > >  scripts/bpf_doc.py             |   2 +
> > >  tools/include/uapi/linux/bpf.h |  40 +++++
> > >  7 files changed, 521 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > index cb9f42866cde..e0fcff9f2aee 100644
> > > --- a/include/linux/bpf.h
> > > +++ b/include/linux/bpf.h
> > > @@ -346,7 +346,13 @@ enum bpf_type_flag {
> > >
> > >         MEM_RELEASE             = BIT(6 + BPF_BASE_TYPE_BITS),
> > >
> > > -       __BPF_TYPE_LAST_FLAG    = MEM_RELEASE,
> > > +       /* DYNPTR points to a program's local memory (eg stack variable). */
> > > +       DYNPTR_TYPE_LOCAL       = BIT(7 + BPF_BASE_TYPE_BITS),
> > > +
> > > +       /* DYNPTR points to dynamically allocated memory. */
> > > +       DYNPTR_TYPE_MALLOC      = BIT(8 + BPF_BASE_TYPE_BITS),
> > > +
> > > +       __BPF_TYPE_LAST_FLAG    = DYNPTR_TYPE_MALLOC,
> > >  };
> > >
> > >  /* Max number of base types. */
> > > @@ -390,6 +396,7 @@ enum bpf_arg_type {
> > >         ARG_PTR_TO_STACK,       /* pointer to stack */
> > >         ARG_PTR_TO_CONST_STR,   /* pointer to a null terminated read-only string */
> > >         ARG_PTR_TO_TIMER,       /* pointer to bpf_timer */
> > > +       ARG_PTR_TO_DYNPTR,      /* pointer to bpf_dynptr. See bpf_type_flag for dynptr type */
> > >         __BPF_ARG_TYPE_MAX,
> > >
> > >         /* Extended arg_types. */
> > > @@ -2396,4 +2403,69 @@ int bpf_bprintf_prepare(char *fmt, u32 fmt_size, const u64 *raw_args,
> > >                         u32 **bin_buf, u32 num_args);
> > >  void bpf_bprintf_cleanup(void);
> > >
> > > +/* the implementation of the opaque uapi struct bpf_dynptr */
> > > +struct bpf_dynptr_kern {
> > > +       u8 *data;
> >
> > nit: u8 * is too specific, it's not always "bytes" of data. Let's use `void *`?
> Sounds great! My reason for going with u8 * instead of void * is that
> void pointer arithmetic in C is invalid - but it seems like this isn't
> something we have to worry about here since gcc is the default
> compiler for linux and gcc allows it as an extension

Right, we do void * arithmetic everywhere. I thought this restriction
is C++-specific. But there might be GCC extensions as well.


> >
> > > +       /* The upper 4 bits are reserved. Bit 29 denotes whether the
> > > +        * dynptr is read-only. Bits 30 - 32 denote the dynptr type.
> > > +        */
> >
> > not essential, but I think using highest bit for read-only and then
> > however many next upper bits for dynptr kind is a bit cleaner
> > approach.
> I'm happy with either - I was thinking if we have the uppermost bits
> be dynptr kind, then that makes it easiest to get the dynptr type
> (simply size >> DYNPTR_TYPE_SHIFT) whereas if the read-only bit is the
> highest bit, then we also need to clear that out. But not a big deal
> :)

I think you'll want to define DYNPTR_TYPE_MASK anyways and then it's
just (size >> DYNPTR_TYPE_SHIFT) & DYNPTR_TYPE_MASK.

> >
> > also it seems like normally bits are zero-indexed, so, pedantically,
> > there is no bit 32, it's bit #31
> >
> > > +       u32 size;
> > > +       u32 offset;
> >
> > Let's document the semantics of offset and size. E.g., if I have
> > offset 4 and size 20, does it mean there were 24 bytes, but we ignore
> > first 4 and can address next 20, or does it mean that there is 20
> > bytes, we skip first 4 and have 16 addressable. Basically, usable size
> > is just size of size - offset? That will change how/whether the size
> > is adjusted when offset is moved.
> >
> > > +} __aligned(8);
> > > +
> > > +enum bpf_dynptr_type {
> >
> > it's a good idea to have default zero value to be BPF_DYNPTR_TYPE_INVALID
> >
> > > +       /* Local memory used by the bpf program (eg stack variable) */
> > > +       BPF_DYNPTR_TYPE_LOCAL,
> > > +       /* Memory allocated dynamically by the kernel for the dynptr */
> > > +       BPF_DYNPTR_TYPE_MALLOC,
> > > +};
> > > +
> > > +/* The upper 4 bits of dynptr->size are reserved. Consequently, the
> > > + * maximum supported size is 2^28 - 1.
> > > + */
> > > +#define DYNPTR_MAX_SIZE        ((1UL << 28) - 1)
> > > +#define DYNPTR_SIZE_MASK       0xFFFFFFF
> > > +#define DYNPTR_TYPE_SHIFT      29
> >
> > I'm thinking that maybe we should start with reserving entire upper
> > byte in size and offset to be on the safer side? And if 16MB of
> > addressable memory blob isn't enough, we can always relaxed it later.
> > WDYT?
> >
> This sounds great! I will make this change for v2

sounds good

> > > +
> > > +static inline enum bpf_dynptr_type bpf_dynptr_get_type(struct bpf_dynptr_kern *ptr)
> > > +{
> > > +       return ptr->size >> DYNPTR_TYPE_SHIFT;
> > > +}
> > > +
> > > +static inline void bpf_dynptr_set_type(struct bpf_dynptr_kern *ptr, enum bpf_dynptr_type type)
> > > +{
> > > +       ptr->size |= type << DYNPTR_TYPE_SHIFT;
> > > +}
> > > +
> > > +static inline u32 bpf_dynptr_get_size(struct bpf_dynptr_kern *ptr)
> > > +{
> > > +       return ptr->size & DYNPTR_SIZE_MASK;
> > > +}
> > > +
> > > +static inline int bpf_dynptr_check_size(u32 size)
> > > +{
> > > +       if (size == 0)
> > > +               return -EINVAL;
> >
> > What's the downside of allowing size 0? Honest question. I'm wondering
> > why prevent having dynptr pointing to an "empty slice"? It might be a
> > useful feature in practice.
> I don't see the use of dynptrs that point to something of size 0, so I
> thought it'd be simplest to just return an -EINVAL if the user tries
> to create one. I don't have a particular preference for handling this
> though - especially if this will be a useful feature in the future,
> then I agree we should just let the user create and use empty slices
> if they wish to.

taking Go slices as an example, empty slice is a useful thing
sometimes, makes some algorithms more uniform.

> >
> > > +
> > > +       if (size > DYNPTR_MAX_SIZE)
> > > +               return -E2BIG;
> > > +
> > > +       return 0;
> > > +}
> > > +
> > > +static inline int bpf_dynptr_check_off_len(struct bpf_dynptr_kern *ptr, u32 offset, u32 len)
> > > +{
> > > +       u32 capacity = bpf_dynptr_get_size(ptr) - ptr->offset;
> > > +
> > > +       if (len > capacity || offset > capacity - len)
> > > +               return -EINVAL;
> > > +
> > > +       return 0;
> > > +}
> > > +
> > > +void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data, enum bpf_dynptr_type type,
> > > +                    u32 offset, u32 size);
> > > +
> > > +void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
> > > +
> > >  #endif /* _LINUX_BPF_H */
> > > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > > index 7a01adc9e13f..bc0f105148f9 100644
> > > --- a/include/linux/bpf_verifier.h
> > > +++ b/include/linux/bpf_verifier.h
> > > @@ -72,6 +72,18 @@ struct bpf_reg_state {
> > >
> > >                 u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */
> > >
> > > +               /* for dynptr stack slots */
> > > +               struct {
> > > +                       enum bpf_dynptr_type dynptr_type;
> > > +                       /* A dynptr is 16 bytes so it takes up 2 stack slots.
> > > +                        * We need to track which slot is the first slot
> > > +                        * to protect against cases where the user may try to
> > > +                        * pass in an address starting at the second slot of the
> > > +                        * dynptr.
> > > +                        */
> > > +                       bool dynptr_first_slot;
> > > +               };
> >
> > why not
> >
> > struct {
> >     enum bpf_dynptr_type type;
> >     bool first_lot;
> > } dynptr;
> >
> > ? I think it's cleaner grouping
> Agreed! I will make this change for v2
> >
> [...]
> >
> > > + *     Description
> > > + *             Dynamically allocate memory of *size* bytes.
> > > + *
> > > + *             Every call to bpf_malloc must have a corresponding
> > > + *             bpf_free, regardless of whether the bpf_malloc
> > > + *             succeeded.
> > > + *
> > > + *             The maximum *size* supported is DYNPTR_MAX_SIZE.
> > > + *     Return
> > > + *             0 on success, -ENOMEM if there is not enough memory for the
> > > + *             allocation, -EINVAL if the size is 0 or exceeds DYNPTR_MAX_SIZE.
> > > + *
> > > + * void bpf_free(struct bpf_dynptr *ptr)
> >
> > thinking about the next patch set that will add storing this malloc
> > dynptr into the map, bpf_free() will be a lie, right? As it will only
> > decrement a refcnt, not necessarily free it, right? So maybe just
> > generic bpf_dynptr_put() or bpf_malloc_put() or something like that is
> > a bit more "truthful"?
> I like the simplicity of bpf_free(), but I can see how that might be
> confusing. What are your thoughts on "bpf_dynptr_free()"? Since when
> we get into dynptrs that are stored in maps vs. dynptrs stored
> locally, calling bpf_dynptr_free() frees (invalidates) your local
> dynptr even if it doesn't free the underlying memory if it still has
> valid refcounts on it? To me, "malloc" and "_free" go more intuitively
> together as a pair.

Sounds good to me (though let's use _dynptr() as a suffix
consistently). I also just realized that maybe we should call
bpf_malloc() a bpf_malloc_dynptr() instead. I can see how we might
want to enable plain bpf_malloc() with statically known size (similar
to statically known bpf_ringbuf_reserve()) for temporary local malloc
with direct memory access? So bpf_malloc_dynptr() would be a
dynptr-enabled counterpart to fixed-sized bpf_malloc()? And then
bpf_free() will work with direct pointer returned from bpf_malloc(),
while bpf_free_dynptr() will work with dynptr returned from
bpf_malloc_dynptr().

> >
> > > + *     Description
> > > + *             Free memory allocated by bpf_malloc.
> > > + *
> > > + *             After this operation, *ptr* will be an invalidated dynptr.
> > > + *     Return
> > > + *             Void.
> > >   */
> [...]
> > > +const struct bpf_func_proto bpf_dynptr_from_mem_proto = {
> > > +       .func           = bpf_dynptr_from_mem,
> > > +       .gpl_only       = false,
> > > +       .ret_type       = RET_INTEGER,
> > > +       .arg1_type      = ARG_PTR_TO_MEM,
> >
> > need to think what to do with uninit stack slots. Do we need
> > bpf_dnptr_from_uninit_mem() or we just allow ARG_PTR_TO_MEM |
> > MEM_UNINIT here?
> I think we can just change this to ARG_PTR_TO_MEM | MEM_UNINIT.

sgtm

> >
> > > +       .arg2_type      = ARG_CONST_SIZE_OR_ZERO,
> > > +       .arg3_type      = ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL | MEM_UNINIT,
> > > +};
> > > +
> > > +BPF_CALL_2(bpf_malloc, u32, size, struct bpf_dynptr_kern *, ptr)
> > > +{
> > > +       void *data;
> > > +       int err;
> > > +
> > > +       err = bpf_dynptr_check_size(size);
> > > +       if (err) {
> > > +               bpf_dynptr_set_null(ptr);
> > > +               return err;
> > > +       }
> > > +
> > > +       data = kmalloc(size, GFP_ATOMIC);
> >
> > we have this fancy logic now to allow non-atomic allocation inside
> > sleepable programs, can we use that here as well? In sleepable mode it
> > would be nice to wait for malloc() to grab necessary memory, if
> > possible.
> Agreed - I'm planning to do this in a later "dynptr optimizations"
> patchset (which will also include inlining BPF instructions for some
> of the helper functions)

great, ok

> >
> > > +       if (!data) {
> > > +               bpf_dynptr_set_null(ptr);
> > > +               return -ENOMEM;
> > > +       }
> > > +
> >
> > so.... kmalloc() doesn't zero initialize the memory. I think it's a
> > great property (which we can later modify with flags, if necessary),
> > so I'd do zero-initialization by default. we can keep calling it
> > bpf_malloc() instead of bpf_zalloc(), of course.
> >
> [...]
> > > +static inline int get_spi(s32 off)
> > > +{
> > > +       return (-off - 1) / BPF_REG_SIZE;
> > > +}
> > > +
> > > +static bool check_spi_bounds(struct bpf_func_state *state, int spi, u32 nr_slots)
> >
> > "check_xxx"/"validate_xxx" pattern has ambiguity when it comes to
> > interpreting its return value. In some cases it would be 0 for success
> > and <0 for error, in this it's true/false where probably true meaning
> > all good. It's unfortunate to have to think about this when reading
> > code. If you call it something like "is_stack_range_valid" it would be
> > much more natural to read and reason about, IMO.
> Great point! I'll change this naming for v2
> >
> > BTW, what does "spi" stand for? "stack pointer index"? slot_idx?
> It's not formally documented anywhere but I assume it's short for
> "stack pointer index".
> >
> > > +{
> > > +       int allocated_slots = state->allocated_stack / BPF_REG_SIZE;
> > > +
> > > +       return allocated_slots > spi && nr_slots - 1 <= spi;
> >
> > ok, this is personal preferences, but it took me considerable time to
> > try to understand what's being checked here (this backwards grow of
> > slot indices also threw me off). But seems like we have a range of
> > slots that are calculated as [spi - nr_slots + 1, spi] and we want to
> > check that it's within [0, allocated_stack), so most straightforward
> > way would be:
> >
> > return spi - nr_slots + 1 >= 0 && spi < allocated_slots;
> >
> > And I'd definitely leave a comment about this whole index grows
> > downwards (it's not immediately obvious even if you know that indices
> > are derived from negative stack offsets)
> Awesome, I will make these edits for v2
> >
> [...]
> > > +       switch (type) {
> > > +       case DYNPTR_TYPE_LOCAL:
> > > +               *dynptr_type = BPF_DYNPTR_TYPE_LOCAL;
> > > +               break;
> > > +       case DYNPTR_TYPE_MALLOC:
> > > +               *dynptr_type = BPF_DYNPTR_TYPE_MALLOC;
> > > +               break;
> > > +       default:
> > > +               /* Can't have more than one type set and can't have no
> > > +                * type set
> > > +                */
> > > +               return -EINVAL;
> >
> > see above about BPF_DYNPTR_TYPE_INVALID, with that you don't have to
> > use out parameter, just return enum bpf_dynptr_type directly with
> > BPF_DYNPTR_TYPE_INVALID marking an error
> Nice! I love this suggestion - it makes this a lot smoother.
> >
> > > +       }
> > > +
> > > +       return 0;
> > > +}
> > > +
> [...]
> > > +
> > > +/* Check if the dynptr argument is a proper initialized dynptr */
> > > +static bool check_dynptr_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> > > +                             enum bpf_arg_type arg_type)
> >
> > is_dynptr_valid()? You are not checking if it's just initialized but
> > also that it matches arg_type, right? Also see my rambling about
> > check_xxx naming
> I will rename this to is_dynptr_init_valid(). is_dynptr_valid() might
> be too generic - for example, a valid dynptr should also have its
> stack slots marked accordingly, which isn't true here since this
> dynptr is uninitialized. I think is_dynptr_init_valid() will be
> clearest

sgtm

> >
> > > +{
> > > +       struct bpf_func_state *state = func(env, reg);
> > > +       enum bpf_dynptr_type expected_type;
> > > +       int spi, err;
> > > +
> > > +       /* Can't pass in a dynptr at a weird offset */
> > > +       if (reg->off % BPF_REG_SIZE)
> > > +               return false;
> > > +
> > > +       spi = get_spi(reg->off);
> > > +
> > > +       if (!check_spi_bounds(state, spi, BPF_DYNPTR_NR_SLOTS))
> > > +               return false;
> > > +
> > > +       if (!state->stack[spi].spilled_ptr.dynptr_first_slot)
> > > +               return false;
> > > +
> > > +       if (state->stack[spi].slot_type[0] != STACK_DYNPTR)
> > > +               return false;
> > > +
> > > +       /* ARG_PTR_TO_DYNPTR takes any type of dynptr */
> > > +       if (arg_type == ARG_PTR_TO_DYNPTR)
> > > +               return true;
> > > +
> > > +       err = arg_to_dynptr_type(arg_type, &expected_type);
> > > +       if (unlikely(err))
> > > +               return err;
> > > +
> > > +       return state->stack[spi].spilled_ptr.dynptr_type == expected_type;
> > > +}
> [...]
> > > +/*
> > > + * Determines whether the id used for reference tracking is held in a stack slot
> > > + * or in a register
> > > + */
> > > +static bool id_in_stack_slot(enum bpf_arg_type arg_type)
> >
> > is_ or has_ is a good idea for such bool-returning helpers (similarly
> > for stack_access_into_dynptr above), otherwise it reads like a verb
> > and command to do something
> >
> > but looking few lines below, if (arg_type_is_dynptr()) would be
> > clearer than extra wrapper function, not sure what's the purpose of
> > the helper
> My thinking behind this extra wrapper function was that it'd be more
> extensible in the future if there are other types that will store
> their id in the stack slot. But I think I'm over-optimizing here :)
> I'll remove this wrapper function

yeah, it's internal implementation, so we can always refactor

> >
> [...]
> > > @@ -5572,6 +5758,40 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
> > >                 bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
> > >
> > >                 err = check_mem_size_reg(env, reg, regno, zero_size_allowed, meta);
> > > +       } else if (arg_type_is_dynptr(arg_type)) {
> > > +               bool initialized = check_dynptr_init(env, reg, arg_type);
> > > +
> > > +               if (type_is_uninit_mem(arg_type)) {
> > > +                       if (initialized) {
> > > +                               verbose(env, "Arg #%d dynptr cannot be an initialized dynptr\n",
> > > +                                       arg + 1);
> > > +                               return -EINVAL;
> > > +                       }
> > > +                       meta->raw_mode = true;
> > > +                       err = check_helper_mem_access(env, regno, BPF_DYNPTR_SIZE, false, meta);
> > > +                       /* For now, we do not allow dynptrs to point to existing
> > > +                        * refcounted memory
> > > +                        */
> > > +                       if (reg_type_may_be_refcounted_or_null(regs[BPF_REG_1].type)) {
> >
> > hard-coded BPF_REG_1?
>
> I'm viewing this as a temporary line because one of the patches in a
> later dynptr patchset will enable support for local dynptrs to point
> to existing refcounted memory. The alternative is to add a new
> bpf_type_flag like NO_REFCOUNT and then remove that flag later. What
> are your thoughts?

my concern and confusion was that it's a hard-coded BPF_REG_1 instead
of using arg to derive register itself. Why making unnecessary
assumptions about this always being a first argument?

> >
> > > +                               verbose(env, "Arg #%d dynptr memory cannot be potentially refcounted\n",
> > > +                                       arg + 1);
> > > +                               return -EINVAL;
> > > +                       }
> > > +               } else {
> > > +                       if (!initialized) {
> > > +                               char *err_extra = "";
> >
> > const char *
> >
> > > +
> > > +                               if (arg_type & DYNPTR_TYPE_LOCAL)
> > > +                                       err_extra = "local ";
> > > +                               else if (arg_type & DYNPTR_TYPE_MALLOC)
> > > +                                       err_extra = "malloc ";
> > > +                               verbose(env, "Expected an initialized %sdynptr as arg #%d\n",
> > > +                                       err_extra, arg + 1);
> >
> > what if helper accepts two or more different types of dynptr?
> Currently, bpf_dynptr_read/write accept any type of dynptr so they
> don't set any dynptr type flag, which means this error would just
> print "Expected an initialized dynptr as arg...". But you're right
> that in the future, there can be some API that accepts only a subset
> (eg mallocs and ringbuffers and not local dynptrs); in this case,
> maybe the simplest is just to return a generic "Expected an
> initialized dynptr as arg...". Do you think this suffices or do you
> think it'd be worth the effort to print out the different types of
> initialized dynptrs it expects?

Yeah, let's keep it simple with generic error, doing multiple string
concatenations for this seems like an overkill.

> >
> > > +                               return -EINVAL;
> > > +                       }
> > > +                       if (type_is_release_mem(arg_type))
> > > +                               err = unmark_stack_slots_dynptr(env, reg);
> > > +               }
> > >         } else if (arg_type_is_alloc_size(arg_type)) {
> > >                 if (!tnum_is_const(reg->var_off)) {
> > >                         verbose(env, "R%d is not a known constant'\n",
> >
> > [...]
> Thanks for your feedback, Andrii!!

sure, np

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v1 4/7] bpf: Add bpf_dynptr_read and bpf_dynptr_write
  2022-04-06 22:32   ` Andrii Nakryiko
@ 2022-04-08 23:07     ` Joanne Koong
  0 siblings, 0 replies; 32+ messages in thread
From: Joanne Koong @ 2022-04-08 23:07 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Joanne Koong, bpf, Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann

On Wed, Apr 6, 2022 at 3:32 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Fri, Apr 1, 2022 at 7:00 PM Joanne Koong <joannekoong@fb.com> wrote:
> >
> > From: Joanne Koong <joannelkoong@gmail.com>
> >
> > This patch adds two helper functions, bpf_dynptr_read and
> > bpf_dynptr_write:
> >
> > long bpf_dynptr_read(void *dst, u32 len, struct bpf_dynptr *src, u32 offset);
> >
> > long bpf_dynptr_write(struct bpf_dynptr *dst, u32 offset, void *src, u32 len);
> >
> > The dynptr passed into these functions must be valid dynptrs that have
> > been initialized.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
> >  include/linux/bpf.h            |  6 ++++
> >  include/uapi/linux/bpf.h       | 18 +++++++++++
> >  kernel/bpf/helpers.c           | 56 ++++++++++++++++++++++++++++++++++
> >  tools/include/uapi/linux/bpf.h | 18 +++++++++++
> >  4 files changed, 98 insertions(+)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index e0fcff9f2aee..cded9753fb7f 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -2426,6 +2426,12 @@ enum bpf_dynptr_type {
> >  #define DYNPTR_MAX_SIZE        ((1UL << 28) - 1)
> >  #define DYNPTR_SIZE_MASK       0xFFFFFFF
> >  #define DYNPTR_TYPE_SHIFT      29
> > +#define DYNPTR_RDONLY_BIT      BIT(28)
> > +
> > +static inline bool bpf_dynptr_is_rdonly(struct bpf_dynptr_kern *ptr)
> > +{
> > +       return ptr->size & DYNPTR_RDONLY_BIT;
> > +}
> >
> >  static inline enum bpf_dynptr_type bpf_dynptr_get_type(struct bpf_dynptr_kern *ptr)
> >  {
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 6a57d8a1b882..16a35e46be90 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -5175,6 +5175,22 @@ union bpf_attr {
> >   *             After this operation, *ptr* will be an invalidated dynptr.
> >   *     Return
> >   *             Void.
> > + *
> > + * long bpf_dynptr_read(void *dst, u32 len, struct bpf_dynptr *src, u32 offset)
> > + *     Description
> > + *             Read *len* bytes from *src* into *dst*, starting from *offset*
> > + *             into *dst*.
> > + *     Return
> > + *             0 on success, -EINVAL if *offset* + *len* exceeds the length
> > + *             of *src*'s data or if *src* is an invalid dynptr.
> > + *
> > + * long bpf_dynptr_write(struct bpf_dynptr *dst, u32 offset, void *src, u32 len)
> > + *     Description
> > + *             Write *len* bytes from *src* into *dst*, starting from *offset*
> > + *             into *dst*.
> > + *     Return
> > + *             0 on success, -EINVAL if *offset* + *len* exceeds the length
> > + *             of *dst*'s data or if *dst* is not writeable.
>
> Did you plan to also add a helper to copy from one dynptr to another?
> Something like
>
> long bpf_dynptr_copy(struct bpf_dynptr *dst, struct bpf_dyn_ptr *src, u32 len) ?
>
> Otherwise there won't be any way to copy memory from malloc'ed range
> to ringbuf, for example, without doing intermediate copy. Not sure
> what to do about extra offsets...
Yes! I plan for the 3rd patchset in this dynptr series to be around
convenience helpers, which will include bpf_dynptr_copy.
For the offsets, I was thinking just copy from src data + src internal
offset to dst data + dst internal offset, where there will also be
dynptr helper functions that can be called to adjust offsets
>
> >   */
> >  #define __BPF_FUNC_MAPPER(FN)          \
> >         FN(unspec),                     \
> > @@ -5374,6 +5390,8 @@ union bpf_attr {
> >         FN(dynptr_from_mem),            \
> >         FN(malloc),                     \
> >         FN(free),                       \
> > +       FN(dynptr_read),                \
> > +       FN(dynptr_write),               \
> >         /* */
> >
> >  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index ed5a7d9d0a18..7ec20e79928e 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -1412,6 +1412,58 @@ const struct bpf_func_proto bpf_dynptr_from_mem_proto = {
> >         .arg3_type      = ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL | MEM_UNINIT,
> >  };
> >
> > +BPF_CALL_4(bpf_dynptr_read, void *, dst, u32, len, struct bpf_dynptr_kern *, src, u32, offset)
> > +{
> > +       int err;
> > +
> > +       if (!src->data)
> > +               return -EINVAL;
> > +
> > +       err = bpf_dynptr_check_off_len(src, offset, len);
>
> you defined this function in patch #3, but didn't use it there. Let's
> move the definition into this patch?
Sounds great!
>
> > +       if (err)
> > +               return err;
> > +
> > +       memcpy(dst, src->data + src->offset + offset, len);
> > +
> > +       return 0;
> > +}
> > +
>
> [...]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v1 7/7] bpf: Dynptr tests
  2022-04-06 23:11   ` Andrii Nakryiko
@ 2022-04-08 23:16     ` Joanne Koong
  0 siblings, 0 replies; 32+ messages in thread
From: Joanne Koong @ 2022-04-08 23:16 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Joanne Koong, bpf, Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann

On Wed, Apr 6, 2022 at 4:11 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Fri, Apr 1, 2022 at 7:00 PM Joanne Koong <joannekoong@fb.com> wrote:
> >
> > From: Joanne Koong <joannelkoong@gmail.com>
> >
> > This patch adds tests for dynptrs. These include scenarios that the
> > verifier needs to reject, as well as some successful use cases of
> > dynptrs that should pass.
> >
> > Some of the failure scenarios include checking against invalid bpf_frees,
> > invalid writes, invalid reads, and invalid ringbuf API usages.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
>
> Great set of tests! Hard to keep reading 500+ lines of failing use
> cases, but seems like a lot of interesting corner cases are handled!
> Great job!
>
> >  .../testing/selftests/bpf/prog_tests/dynptr.c | 303 ++++++++++
> >  .../testing/selftests/bpf/progs/dynptr_fail.c | 527 ++++++++++++++++++
> >  .../selftests/bpf/progs/dynptr_success.c      | 147 +++++
> >  3 files changed, 977 insertions(+)
> >  create mode 100644 tools/testing/selftests/bpf/prog_tests/dynptr.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/dynptr_fail.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/dynptr_success.c
> >
> > diff --git a/tools/testing/selftests/bpf/prog_tests/dynptr.c b/tools/testing/selftests/bpf/prog_tests/dynptr.c
> > new file mode 100644
> > index 000000000000..7107ebee3427
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/prog_tests/dynptr.c
> > @@ -0,0 +1,303 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright (c) 2022 Facebook */
> > +
> > +#include <test_progs.h>
> > +#include "dynptr_fail.skel.h"
> > +#include "dynptr_success.skel.h"
> > +
> > +size_t log_buf_sz = 1024 * 1024;
> > +
> > +enum fail_case {
> > +       MISSING_FREE,
> > +       MISSING_FREE_CALLBACK,
> > +       INVALID_FREE1,
> > +       INVALID_FREE2,
> > +       USE_AFTER_FREE,
> > +       MALLOC_TWICE,
> > +       INVALID_MAP_CALL1,
> > +       INVALID_MAP_CALL2,
> > +       RINGBUF_INVALID_ACCESS,
> > +       RINGBUF_INVALID_API,
> > +       RINGBUF_OUT_OF_BOUNDS,
> > +       DATA_SLICE_OUT_OF_BOUNDS,
> > +       DATA_SLICE_USE_AFTER_FREE,
> > +       INVALID_HELPER1,
> > +       INVALID_HELPER2,
> > +       INVALID_WRITE1,
> > +       INVALID_WRITE2,
> > +       INVALID_WRITE3,
> > +       INVALID_WRITE4,
> > +       INVALID_READ1,
> > +       INVALID_READ2,
> > +       INVALID_READ3,
> > +       INVALID_OFFSET,
> > +       GLOBAL,
> > +       FREE_TWICE,
> > +       FREE_TWICE_CALLBACK,
> > +};
>
> it might make sense to just pass the program name as a string instead,
> just like expected error message. This will allow more table-like
> subtest specification (I'll expand below)
>
[...]
> > +
> > +       if (test__start_subtest("missing_free"))
> > +               verify_fail(MISSING_FREE, obj_log_buf,
> > +                           "spi=0 is an unreleased dynptr");
> > +
>
> [...]
>
> > +       if (test__start_subtest("free_twice_callback"))
> > +               verify_fail(FREE_TWICE_CALLBACK, obj_log_buf,
> > +                           "arg #1 is an unacquired reference and hence cannot be released");
> > +
> > +       if (test__start_subtest("success"))
> > +               verify_success();
>
> so instead of manually coded set of tests, it's more "scalable" to go
> with table-driven approach. Something like
>
> struct {
>     const char *prog_name;
>     const char *exp_msg;
> } tests = {
>   {"invalid_read2", "Expected an initialized dynptr as arg #3"},
>   {"prog_success_ringbuf", NULL /* success case */},
>   ...
> };
>
> then you can just succinctly:
>
> for (i = 0; i < ARRAY_SIZE(tests); i++) {
>   if (!test__start_subtest(tests[i].prog_name))
>     continue;
>
>   if (tests[i].exp_msg)
>     verify_fail(tests[i].prog_name, tests[i].exp_msg);
>   else
>     verify_success(tests[i].prog_name);
> }
>
> Then adding new cases would be only adding BPF code and adding a
> single line in the tests table.
>
Awesome!! I love this. This will make it a lot easier to read!
> > +
> > +       free(obj_log_buf);
> > +}
[...]
>
> > +/* A dynptr can't be passed into a helper function at a non-zero offset */
> > +SEC("raw_tp/sys_nanosleep")
> > +int invalid_helper2(void *ctx)
> > +{
> > +       struct bpf_dynptr ptr = {};
> > +       char read_data[64] = {};
> > +       __u64 x = 0;
> > +
> > +       bpf_dynptr_from_mem(&x, sizeof(x), &ptr);
> > +
> > +       /* this should fail */
> > +       bpf_dynptr_read(read_data, sizeof(read_data), (void *)&ptr + 8, 0);
> > +
> > +       return 0;
> > +}
> > +
> > +/* A data slice can't be accessed out of bounds */
> > +SEC("fentry/" SYS_PREFIX "sys_nanosleep")
>
> why switching to fentry here with this ugly SYS_PREFIX thingy?
Ooh thanks for spotting this, I forgot to switch this over. Will
definitely change this for v2!
>
[...]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v1 3/7] bpf: Add bpf_dynptr_from_mem, bpf_malloc, bpf_free
  2022-04-08 22:46       ` Andrii Nakryiko
@ 2022-04-08 23:37         ` Joanne Koong
  2022-04-09  1:11           ` Alexei Starovoitov
  2022-04-12  2:05           ` Andrii Nakryiko
  0 siblings, 2 replies; 32+ messages in thread
From: Joanne Koong @ 2022-04-08 23:37 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Joanne Koong, bpf, Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann

On Fri, Apr 8, 2022 at 3:46 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Fri, Apr 8, 2022 at 3:04 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > On Wed, Apr 6, 2022 at 3:23 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Fri, Apr 1, 2022 at 7:00 PM Joanne Koong <joannekoong@fb.com> wrote:
> > > >
> > > > From: Joanne Koong <joannelkoong@gmail.com>
> > > >
> > > > This patch adds 3 new APIs and the bulk of the verifier work for
> > > > supporting dynamic pointers in bpf.
> > > >
> > > > There are different types of dynptrs. This patch starts with the most
> > > > basic ones, ones that reference a program's local memory
> > > > (eg a stack variable) and ones that reference memory that is dynamically
> > > > allocated on behalf of the program. If the memory is dynamically
> > > > allocated by the program, the program *must* free it before the program
> > > > exits. This is enforced by the verifier.
> > > >
> > > > The added APIs are:
> > > >
> > > > long bpf_dynptr_from_mem(void *data, u32 size, struct bpf_dynptr *ptr);
> > > > long bpf_malloc(u32 size, struct bpf_dynptr *ptr);
> > > > void bpf_free(struct bpf_dynptr *ptr);
> > > >
> > > > This patch sets up the verifier to support dynptrs. Dynptrs will always
> > > > reside on the program's stack frame. As such, their state is tracked
> > > > in their corresponding stack slot, which includes the type of dynptr
> > > > (DYNPTR_LOCAL vs. DYNPTR_MALLOC).
> > > >
> > > > When the program passes in an uninitialized dynptr (ARG_PTR_TO_DYNPTR |
> > > > MEM_UNINIT), the stack slots corresponding to the frame pointer
> > > > where the dynptr resides at is marked as STACK_DYNPTR. For helper functions
> > > > that take in iniitalized dynptrs (such as the next patch in this series
> > > > which supports dynptr reads/writes), the verifier enforces that the
> > > > dynptr has been initialized by checking that their corresponding stack
> > > > slots have been marked as STACK_DYNPTR. Dynptr release functions
> > > > (eg bpf_free) will clear the stack slots. The verifier enforces at program
> > > > exit that there are no dynptr stack slots that need to be released.
> > > >
> > > > There are other constraints that are enforced by the verifier as
> > > > well, such as that the dynptr cannot be written to directly by the bpf
> > > > program or by non-dynptr helper functions. The last patch in this series
> > > > contains tests that trigger different cases that the verifier needs to
> > > > successfully reject.
> > > >
> > > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > > > ---
> > > >  include/linux/bpf.h            |  74 ++++++++-
> > > >  include/linux/bpf_verifier.h   |  18 +++
> > > >  include/uapi/linux/bpf.h       |  40 +++++
> > > >  kernel/bpf/helpers.c           |  88 +++++++++++
> > > >  kernel/bpf/verifier.c          | 266 ++++++++++++++++++++++++++++++++-
> > > >  scripts/bpf_doc.py             |   2 +
> > > >  tools/include/uapi/linux/bpf.h |  40 +++++
> > > >  7 files changed, 521 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > index cb9f42866cde..e0fcff9f2aee 100644
> > > > --- a/include/linux/bpf.h
> > > > +++ b/include/linux/bpf.h
> > > > @@ -346,7 +346,13 @@ enum bpf_type_flag {
> > > >
> > > >         MEM_RELEASE             = BIT(6 + BPF_BASE_TYPE_BITS),
> > > >
> > > > -       __BPF_TYPE_LAST_FLAG    = MEM_RELEASE,
> > > > +       /* DYNPTR points to a program's local memory (eg stack variable). */
> > > > +       DYNPTR_TYPE_LOCAL       = BIT(7 + BPF_BASE_TYPE_BITS),
> > > > +
> > > > +       /* DYNPTR points to dynamically allocated memory. */
> > > > +       DYNPTR_TYPE_MALLOC      = BIT(8 + BPF_BASE_TYPE_BITS),
> > > > +
> > > > +       __BPF_TYPE_LAST_FLAG    = DYNPTR_TYPE_MALLOC,
> > > >  };
> > > >
> > > >  /* Max number of base types. */
> > > > @@ -390,6 +396,7 @@ enum bpf_arg_type {
> > > >         ARG_PTR_TO_STACK,       /* pointer to stack */
> > > >         ARG_PTR_TO_CONST_STR,   /* pointer to a null terminated read-only string */
> > > >         ARG_PTR_TO_TIMER,       /* pointer to bpf_timer */
> > > > +       ARG_PTR_TO_DYNPTR,      /* pointer to bpf_dynptr. See bpf_type_flag for dynptr type */
> > > >         __BPF_ARG_TYPE_MAX,
> > > >
> > > >         /* Extended arg_types. */
> > > > @@ -2396,4 +2403,69 @@ int bpf_bprintf_prepare(char *fmt, u32 fmt_size, const u64 *raw_args,
> > > >                         u32 **bin_buf, u32 num_args);
> > > >  void bpf_bprintf_cleanup(void);
> > > >
> > > > +/* the implementation of the opaque uapi struct bpf_dynptr */
> > > > +struct bpf_dynptr_kern {
> > > > +       u8 *data;
> > >
> > > nit: u8 * is too specific, it's not always "bytes" of data. Let's use `void *`?
> > Sounds great! My reason for going with u8 * instead of void * is that
> > void pointer arithmetic in C is invalid - but it seems like this isn't
> > something we have to worry about here since gcc is the default
> > compiler for linux and gcc allows it as an extension
>
> Right, we do void * arithmetic everywhere. I thought this restriction
> is C++-specific. But there might be GCC extensions as well.
>
>
> > >
> > > > +       /* The upper 4 bits are reserved. Bit 29 denotes whether the
> > > > +        * dynptr is read-only. Bits 30 - 32 denote the dynptr type.
> > > > +        */
> > >
> > > not essential, but I think using highest bit for read-only and then
> > > however many next upper bits for dynptr kind is a bit cleaner
> > > approach.
> > I'm happy with either - I was thinking if we have the uppermost bits
> > be dynptr kind, then that makes it easiest to get the dynptr type
> > (simply size >> DYNPTR_TYPE_SHIFT) whereas if the read-only bit is the
> > highest bit, then we also need to clear that out. But not a big deal
> > :)
>
> I think you'll want to define DYNPTR_TYPE_MASK anyways and then it's
> just (size >> DYNPTR_TYPE_SHIFT) & DYNPTR_TYPE_MASK.
>
> > >
> > > also it seems like normally bits are zero-indexed, so, pedantically,
> > > there is no bit 32, it's bit #31
> > >
> > > > +       u32 size;
> > > > +       u32 offset;
> > >
> > > Let's document the semantics of offset and size. E.g., if I have
> > > offset 4 and size 20, does it mean there were 24 bytes, but we ignore
> > > first 4 and can address next 20, or does it mean that there is 20
> > > bytes, we skip first 4 and have 16 addressable. Basically, usable size
> > > is just size of size - offset? That will change how/whether the size
> > > is adjusted when offset is moved.
> > >
> > > > +} __aligned(8);
> > > > +
> > > > +enum bpf_dynptr_type {
> > >
> > > it's a good idea to have default zero value to be BPF_DYNPTR_TYPE_INVALID
> > >
> > > > +       /* Local memory used by the bpf program (eg stack variable) */
> > > > +       BPF_DYNPTR_TYPE_LOCAL,
> > > > +       /* Memory allocated dynamically by the kernel for the dynptr */
> > > > +       BPF_DYNPTR_TYPE_MALLOC,
> > > > +};
> > > > +
> > > > +/* The upper 4 bits of dynptr->size are reserved. Consequently, the
> > > > + * maximum supported size is 2^28 - 1.
> > > > + */
> > > > +#define DYNPTR_MAX_SIZE        ((1UL << 28) - 1)
> > > > +#define DYNPTR_SIZE_MASK       0xFFFFFFF
> > > > +#define DYNPTR_TYPE_SHIFT      29
> > >
> > > I'm thinking that maybe we should start with reserving entire upper
> > > byte in size and offset to be on the safer side? And if 16MB of
> > > addressable memory blob isn't enough, we can always relaxed it later.
> > > WDYT?
> > >
> > This sounds great! I will make this change for v2
>
> sounds good
>
> > > > +
> > > > +static inline enum bpf_dynptr_type bpf_dynptr_get_type(struct bpf_dynptr_kern *ptr)
> > > > +{
> > > > +       return ptr->size >> DYNPTR_TYPE_SHIFT;
> > > > +}
> > > > +
> > > > +static inline void bpf_dynptr_set_type(struct bpf_dynptr_kern *ptr, enum bpf_dynptr_type type)
> > > > +{
> > > > +       ptr->size |= type << DYNPTR_TYPE_SHIFT;
> > > > +}
> > > > +
> > > > +static inline u32 bpf_dynptr_get_size(struct bpf_dynptr_kern *ptr)
> > > > +{
> > > > +       return ptr->size & DYNPTR_SIZE_MASK;
> > > > +}
> > > > +
> > > > +static inline int bpf_dynptr_check_size(u32 size)
> > > > +{
> > > > +       if (size == 0)
> > > > +               return -EINVAL;
> > >
> > > What's the downside of allowing size 0? Honest question. I'm wondering
> > > why prevent having dynptr pointing to an "empty slice"? It might be a
> > > useful feature in practice.
> > I don't see the use of dynptrs that point to something of size 0, so I
> > thought it'd be simplest to just return an -EINVAL if the user tries
> > to create one. I don't have a particular preference for handling this
> > though - especially if this will be a useful feature in the future,
> > then I agree we should just let the user create and use empty slices
> > if they wish to.
>
> taking Go slices as an example, empty slice is a useful thing
> sometimes, makes some algorithms more uniform.
>
> > >
> > > > +
> > > > +       if (size > DYNPTR_MAX_SIZE)
> > > > +               return -E2BIG;
> > > > +
> > > > +       return 0;
> > > > +}
> > > > +
> > > > +static inline int bpf_dynptr_check_off_len(struct bpf_dynptr_kern *ptr, u32 offset, u32 len)
> > > > +{
> > > > +       u32 capacity = bpf_dynptr_get_size(ptr) - ptr->offset;
> > > > +
> > > > +       if (len > capacity || offset > capacity - len)
> > > > +               return -EINVAL;
> > > > +
> > > > +       return 0;
> > > > +}
> > > > +
> > > > +void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data, enum bpf_dynptr_type type,
> > > > +                    u32 offset, u32 size);
> > > > +
> > > > +void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
> > > > +
> > > >  #endif /* _LINUX_BPF_H */
> > > > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > > > index 7a01adc9e13f..bc0f105148f9 100644
> > > > --- a/include/linux/bpf_verifier.h
> > > > +++ b/include/linux/bpf_verifier.h
> > > > @@ -72,6 +72,18 @@ struct bpf_reg_state {
> > > >
> > > >                 u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */
> > > >
> > > > +               /* for dynptr stack slots */
> > > > +               struct {
> > > > +                       enum bpf_dynptr_type dynptr_type;
> > > > +                       /* A dynptr is 16 bytes so it takes up 2 stack slots.
> > > > +                        * We need to track which slot is the first slot
> > > > +                        * to protect against cases where the user may try to
> > > > +                        * pass in an address starting at the second slot of the
> > > > +                        * dynptr.
> > > > +                        */
> > > > +                       bool dynptr_first_slot;
> > > > +               };
> > >
> > > why not
> > >
> > > struct {
> > >     enum bpf_dynptr_type type;
> > >     bool first_lot;
> > > } dynptr;
> > >
> > > ? I think it's cleaner grouping
> > Agreed! I will make this change for v2
> > >
> > [...]
> > >
> > > > + *     Description
> > > > + *             Dynamically allocate memory of *size* bytes.
> > > > + *
> > > > + *             Every call to bpf_malloc must have a corresponding
> > > > + *             bpf_free, regardless of whether the bpf_malloc
> > > > + *             succeeded.
> > > > + *
> > > > + *             The maximum *size* supported is DYNPTR_MAX_SIZE.
> > > > + *     Return
> > > > + *             0 on success, -ENOMEM if there is not enough memory for the
> > > > + *             allocation, -EINVAL if the size is 0 or exceeds DYNPTR_MAX_SIZE.
> > > > + *
> > > > + * void bpf_free(struct bpf_dynptr *ptr)
> > >
> > > thinking about the next patch set that will add storing this malloc
> > > dynptr into the map, bpf_free() will be a lie, right? As it will only
> > > decrement a refcnt, not necessarily free it, right? So maybe just
> > > generic bpf_dynptr_put() or bpf_malloc_put() or something like that is
> > > a bit more "truthful"?
> > I like the simplicity of bpf_free(), but I can see how that might be
> > confusing. What are your thoughts on "bpf_dynptr_free()"? Since when
> > we get into dynptrs that are stored in maps vs. dynptrs stored
> > locally, calling bpf_dynptr_free() frees (invalidates) your local
> > dynptr even if it doesn't free the underlying memory if it still has
> > valid refcounts on it? To me, "malloc" and "_free" go more intuitively
> > together as a pair.
>
> Sounds good to me (though let's use _dynptr() as a suffix
> consistently). I also just realized that maybe we should call
> bpf_malloc() a bpf_malloc_dynptr() instead. I can see how we might
> want to enable plain bpf_malloc() with statically known size (similar
> to statically known bpf_ringbuf_reserve()) for temporary local malloc
> with direct memory access? So bpf_malloc_dynptr() would be a
> dynptr-enabled counterpart to fixed-sized bpf_malloc()? And then
> bpf_free() will work with direct pointer returned from bpf_malloc(),
> while bpf_free_dynptr() will work with dynptr returned from
> bpf_malloc_dynptr().
I see! What is the advantage of a plain bpf_malloc()? Is it that it's
a more ergonomic API (you get back a direct pointer to the data
instead of getting back a dynptr and then having to call
bpf_dynptr_data to get direct access) and you don't have to allocate
extra bytes for refcounting?

I will rename this to bpf_malloc_dynptr() and bpf_free_dynptr().
>
> > >
> > > > + *     Description
> > > > + *             Free memory allocated by bpf_malloc.
> > > > + *
> > > > + *             After this operation, *ptr* will be an invalidated dynptr.
> > > > + *     Return
> > > > + *             Void.
> > > >   */
> > [...]
> > > > +const struct bpf_func_proto bpf_dynptr_from_mem_proto = {
> > > > +       .func           = bpf_dynptr_from_mem,
> > > > +       .gpl_only       = false,
> > > > +       .ret_type       = RET_INTEGER,
> > > > +       .arg1_type      = ARG_PTR_TO_MEM,
> > >
> > > need to think what to do with uninit stack slots. Do we need
> > > bpf_dnptr_from_uninit_mem() or we just allow ARG_PTR_TO_MEM |
> > > MEM_UNINIT here?
> > I think we can just change this to ARG_PTR_TO_MEM | MEM_UNINIT.
>
> sgtm
>
> > >
> > > > +       .arg2_type      = ARG_CONST_SIZE_OR_ZERO,
> > > > +       .arg3_type      = ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL | MEM_UNINIT,
> > > > +};
> > > > +
> > > > +BPF_CALL_2(bpf_malloc, u32, size, struct bpf_dynptr_kern *, ptr)
> > > > +{
> > > > +       void *data;
> > > > +       int err;
> > > > +
> > > > +       err = bpf_dynptr_check_size(size);
> > > > +       if (err) {
> > > > +               bpf_dynptr_set_null(ptr);
> > > > +               return err;
> > > > +       }
> > > > +
> > > > +       data = kmalloc(size, GFP_ATOMIC);
> > >
> > > we have this fancy logic now to allow non-atomic allocation inside
> > > sleepable programs, can we use that here as well? In sleepable mode it
> > > would be nice to wait for malloc() to grab necessary memory, if
> > > possible.
> > Agreed - I'm planning to do this in a later "dynptr optimizations"
> > patchset (which will also include inlining BPF instructions for some
> > of the helper functions)
>
> great, ok
>
> > >
> > > > +       if (!data) {
> > > > +               bpf_dynptr_set_null(ptr);
> > > > +               return -ENOMEM;
> > > > +       }
> > > > +
> > >
> > > so.... kmalloc() doesn't zero initialize the memory. I think it's a
> > > great property (which we can later modify with flags, if necessary),
> > > so I'd do zero-initialization by default. we can keep calling it
> > > bpf_malloc() instead of bpf_zalloc(), of course.
> > >
> > [...]
> > > > +static inline int get_spi(s32 off)
> > > > +{
> > > > +       return (-off - 1) / BPF_REG_SIZE;
> > > > +}
> > > > +
> > > > +static bool check_spi_bounds(struct bpf_func_state *state, int spi, u32 nr_slots)
> > >
> > > "check_xxx"/"validate_xxx" pattern has ambiguity when it comes to
> > > interpreting its return value. In some cases it would be 0 for success
> > > and <0 for error, in this it's true/false where probably true meaning
> > > all good. It's unfortunate to have to think about this when reading
> > > code. If you call it something like "is_stack_range_valid" it would be
> > > much more natural to read and reason about, IMO.
> > Great point! I'll change this naming for v2
> > >
> > > BTW, what does "spi" stand for? "stack pointer index"? slot_idx?
> > It's not formally documented anywhere but I assume it's short for
> > "stack pointer index".
> > >
> > > > +{
> > > > +       int allocated_slots = state->allocated_stack / BPF_REG_SIZE;
> > > > +
> > > > +       return allocated_slots > spi && nr_slots - 1 <= spi;
> > >
> > > ok, this is personal preferences, but it took me considerable time to
> > > try to understand what's being checked here (this backwards grow of
> > > slot indices also threw me off). But seems like we have a range of
> > > slots that are calculated as [spi - nr_slots + 1, spi] and we want to
> > > check that it's within [0, allocated_stack), so most straightforward
> > > way would be:
> > >
> > > return spi - nr_slots + 1 >= 0 && spi < allocated_slots;
> > >
> > > And I'd definitely leave a comment about this whole index grows
> > > downwards (it's not immediately obvious even if you know that indices
> > > are derived from negative stack offsets)
> > Awesome, I will make these edits for v2
> > >
> > [...]
> > > > +       switch (type) {
> > > > +       case DYNPTR_TYPE_LOCAL:
> > > > +               *dynptr_type = BPF_DYNPTR_TYPE_LOCAL;
> > > > +               break;
> > > > +       case DYNPTR_TYPE_MALLOC:
> > > > +               *dynptr_type = BPF_DYNPTR_TYPE_MALLOC;
> > > > +               break;
> > > > +       default:
> > > > +               /* Can't have more than one type set and can't have no
> > > > +                * type set
> > > > +                */
> > > > +               return -EINVAL;
> > >
> > > see above about BPF_DYNPTR_TYPE_INVALID, with that you don't have to
> > > use out parameter, just return enum bpf_dynptr_type directly with
> > > BPF_DYNPTR_TYPE_INVALID marking an error
> > Nice! I love this suggestion - it makes this a lot smoother.
> > >
> > > > +       }
> > > > +
> > > > +       return 0;
> > > > +}
> > > > +
> > [...]
> > > > +
> > > > +/* Check if the dynptr argument is a proper initialized dynptr */
> > > > +static bool check_dynptr_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> > > > +                             enum bpf_arg_type arg_type)
> > >
> > > is_dynptr_valid()? You are not checking if it's just initialized but
> > > also that it matches arg_type, right? Also see my rambling about
> > > check_xxx naming
> > I will rename this to is_dynptr_init_valid(). is_dynptr_valid() might
> > be too generic - for example, a valid dynptr should also have its
> > stack slots marked accordingly, which isn't true here since this
> > dynptr is uninitialized. I think is_dynptr_init_valid() will be
> > clearest
>
> sgtm
>
> > >
> > > > +{
> > > > +       struct bpf_func_state *state = func(env, reg);
> > > > +       enum bpf_dynptr_type expected_type;
> > > > +       int spi, err;
> > > > +
> > > > +       /* Can't pass in a dynptr at a weird offset */
> > > > +       if (reg->off % BPF_REG_SIZE)
> > > > +               return false;
> > > > +
> > > > +       spi = get_spi(reg->off);
> > > > +
> > > > +       if (!check_spi_bounds(state, spi, BPF_DYNPTR_NR_SLOTS))
> > > > +               return false;
> > > > +
> > > > +       if (!state->stack[spi].spilled_ptr.dynptr_first_slot)
> > > > +               return false;
> > > > +
> > > > +       if (state->stack[spi].slot_type[0] != STACK_DYNPTR)
> > > > +               return false;
> > > > +
> > > > +       /* ARG_PTR_TO_DYNPTR takes any type of dynptr */
> > > > +       if (arg_type == ARG_PTR_TO_DYNPTR)
> > > > +               return true;
> > > > +
> > > > +       err = arg_to_dynptr_type(arg_type, &expected_type);
> > > > +       if (unlikely(err))
> > > > +               return err;
> > > > +
> > > > +       return state->stack[spi].spilled_ptr.dynptr_type == expected_type;
> > > > +}
> > [...]
> > > > +/*
> > > > + * Determines whether the id used for reference tracking is held in a stack slot
> > > > + * or in a register
> > > > + */
> > > > +static bool id_in_stack_slot(enum bpf_arg_type arg_type)
> > >
> > > is_ or has_ is a good idea for such bool-returning helpers (similarly
> > > for stack_access_into_dynptr above), otherwise it reads like a verb
> > > and command to do something
> > >
> > > but looking few lines below, if (arg_type_is_dynptr()) would be
> > > clearer than extra wrapper function, not sure what's the purpose of
> > > the helper
> > My thinking behind this extra wrapper function was that it'd be more
> > extensible in the future if there are other types that will store
> > their id in the stack slot. But I think I'm over-optimizing here :)
> > I'll remove this wrapper function
>
> yeah, it's internal implementation, so we can always refactor
>
> > >
> > [...]
> > > > @@ -5572,6 +5758,40 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
> > > >                 bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
> > > >
> > > >                 err = check_mem_size_reg(env, reg, regno, zero_size_allowed, meta);
> > > > +       } else if (arg_type_is_dynptr(arg_type)) {
> > > > +               bool initialized = check_dynptr_init(env, reg, arg_type);
> > > > +
> > > > +               if (type_is_uninit_mem(arg_type)) {
> > > > +                       if (initialized) {
> > > > +                               verbose(env, "Arg #%d dynptr cannot be an initialized dynptr\n",
> > > > +                                       arg + 1);
> > > > +                               return -EINVAL;
> > > > +                       }
> > > > +                       meta->raw_mode = true;
> > > > +                       err = check_helper_mem_access(env, regno, BPF_DYNPTR_SIZE, false, meta);
> > > > +                       /* For now, we do not allow dynptrs to point to existing
> > > > +                        * refcounted memory
> > > > +                        */
> > > > +                       if (reg_type_may_be_refcounted_or_null(regs[BPF_REG_1].type)) {
> > >
> > > hard-coded BPF_REG_1?
> >
> > I'm viewing this as a temporary line because one of the patches in a
> > later dynptr patchset will enable support for local dynptrs to point
> > to existing refcounted memory. The alternative is to add a new
> > bpf_type_flag like NO_REFCOUNT and then remove that flag later. What
> > are your thoughts?
>
> my concern and confusion was that it's a hard-coded BPF_REG_1 instead
> of using arg to derive register itself. Why making unnecessary
> assumptions about this always being a first argument?
I think otherwise we need to add a temporary bpf_type_flag that
specifies that an arg cannot be refcounted, and then when we allow
local dynptrs to point to refcounted memory later on, we'll need to
remove this flag and the verifier checks associated with it. But
overall, I agree with you that we should just add this bpf_type_flag
to this patchset rather than using BPF_REG_1 as a temporary solution -
I will make this change for v2!
>
> > >
> > > > +                               verbose(env, "Arg #%d dynptr memory cannot be potentially refcounted\n",
> > > > +                                       arg + 1);
> > > > +                               return -EINVAL;
> > > > +                       }
> > > > +               } else {
> > > > +                       if (!initialized) {
> > > > +                               char *err_extra = "";
> > >
> > > const char *
> > >
> > > > +
> > > > +                               if (arg_type & DYNPTR_TYPE_LOCAL)
> > > > +                                       err_extra = "local ";
> > > > +                               else if (arg_type & DYNPTR_TYPE_MALLOC)
> > > > +                                       err_extra = "malloc ";
> > > > +                               verbose(env, "Expected an initialized %sdynptr as arg #%d\n",
> > > > +                                       err_extra, arg + 1);
> > >
> > > what if helper accepts two or more different types of dynptr?
> > Currently, bpf_dynptr_read/write accept any type of dynptr so they
> > don't set any dynptr type flag, which means this error would just
> > print "Expected an initialized dynptr as arg...". But you're right
> > that in the future, there can be some API that accepts only a subset
> > (eg mallocs and ringbuffers and not local dynptrs); in this case,
> > maybe the simplest is just to return a generic "Expected an
> > initialized dynptr as arg...". Do you think this suffices or do you
> > think it'd be worth the effort to print out the different types of
> > initialized dynptrs it expects?
>
> Yeah, let's keep it simple with generic error, doing multiple string
> concatenations for this seems like an overkill.
>
> > >
> > > > +                               return -EINVAL;
> > > > +                       }
> > > > +                       if (type_is_release_mem(arg_type))
> > > > +                               err = unmark_stack_slots_dynptr(env, reg);
> > > > +               }
> > > >         } else if (arg_type_is_alloc_size(arg_type)) {
> > > >                 if (!tnum_is_const(reg->var_off)) {
> > > >                         verbose(env, "R%d is not a known constant'\n",
> > >
> > > [...]
> > Thanks for your feedback, Andrii!!
>
> sure, np

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v1 3/7] bpf: Add bpf_dynptr_from_mem, bpf_malloc, bpf_free
  2022-04-08 23:37         ` Joanne Koong
@ 2022-04-09  1:11           ` Alexei Starovoitov
  2022-04-12  2:12             ` Andrii Nakryiko
  2022-04-12  2:05           ` Andrii Nakryiko
  1 sibling, 1 reply; 32+ messages in thread
From: Alexei Starovoitov @ 2022-04-09  1:11 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Andrii Nakryiko, Joanne Koong, bpf, Andrii Nakryiko,
	Alexei Starovoitov, Daniel Borkmann

On Fri, Apr 08, 2022 at 04:37:02PM -0700, Joanne Koong wrote:
> > > > > + *
> > > > > + * void bpf_free(struct bpf_dynptr *ptr)
> > > >
> > > > thinking about the next patch set that will add storing this malloc
> > > > dynptr into the map, bpf_free() will be a lie, right? As it will only
> > > > decrement a refcnt, not necessarily free it, right? So maybe just
> > > > generic bpf_dynptr_put() or bpf_malloc_put() or something like that is
> > > > a bit more "truthful"?
> > > I like the simplicity of bpf_free(), but I can see how that might be
> > > confusing. What are your thoughts on "bpf_dynptr_free()"? Since when
> > > we get into dynptrs that are stored in maps vs. dynptrs stored
> > > locally, calling bpf_dynptr_free() frees (invalidates) your local
> > > dynptr even if it doesn't free the underlying memory if it still has
> > > valid refcounts on it? To me, "malloc" and "_free" go more intuitively
> > > together as a pair.
> >
> > Sounds good to me (though let's use _dynptr() as a suffix
> > consistently). I also just realized that maybe we should call
> > bpf_malloc() a bpf_malloc_dynptr() instead. I can see how we might
> > want to enable plain bpf_malloc() with statically known size (similar
> > to statically known bpf_ringbuf_reserve()) for temporary local malloc
> > with direct memory access? So bpf_malloc_dynptr() would be a
> > dynptr-enabled counterpart to fixed-sized bpf_malloc()? And then
> > bpf_free() will work with direct pointer returned from bpf_malloc(),
> > while bpf_free_dynptr() will work with dynptr returned from
> > bpf_malloc_dynptr().
> I see! What is the advantage of a plain bpf_malloc()? Is it that it's
> a more ergonomic API (you get back a direct pointer to the data
> instead of getting back a dynptr and then having to call
> bpf_dynptr_data to get direct access) and you don't have to allocate
> extra bytes for refcounting?
> 
> I will rename this to bpf_malloc_dynptr() and bpf_free_dynptr().

Let's make it consistent with kptr. Those helpers will be:
bpf_kptr_alloc(btf_id, flags, &ptr)
bpf_kptr_get
bpf_kptr_put

bpf_dynptr_alloc(byte_size, flags, &dynptr);
bpf_dynptr_put(dynptr);
would fit the best.

Output arg being first doesn't match anything we had.
let's keep it last.

zero-alloc or plain kmalloc can be indicated by the flag.
kzalloc() in the kernel is just static inline that adds __GFP_ZERO to flags.
We don't need bpf_dynptr_alloc and bpf_dynptr_zalloc as two helpers.
The latter can be a static inline helper in a bpf program.

Similar to Andrii's concern I feel that bpf_dynptr_free() would be misleading.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v1 3/7] bpf: Add bpf_dynptr_from_mem, bpf_malloc, bpf_free
  2022-04-08 23:37         ` Joanne Koong
  2022-04-09  1:11           ` Alexei Starovoitov
@ 2022-04-12  2:05           ` Andrii Nakryiko
  1 sibling, 0 replies; 32+ messages in thread
From: Andrii Nakryiko @ 2022-04-12  2:05 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Joanne Koong, bpf, Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann

On Fri, Apr 8, 2022 at 4:37 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Fri, Apr 8, 2022 at 3:46 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Fri, Apr 8, 2022 at 3:04 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > >
> > > On Wed, Apr 6, 2022 at 3:23 PM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > On Fri, Apr 1, 2022 at 7:00 PM Joanne Koong <joannekoong@fb.com> wrote:
> > > > >
> > > > > From: Joanne Koong <joannelkoong@gmail.com>
> > > > >
> > > > > This patch adds 3 new APIs and the bulk of the verifier work for
> > > > > supporting dynamic pointers in bpf.
> > > > >
> > > > > There are different types of dynptrs. This patch starts with the most
> > > > > basic ones, ones that reference a program's local memory
> > > > > (eg a stack variable) and ones that reference memory that is dynamically
> > > > > allocated on behalf of the program. If the memory is dynamically
> > > > > allocated by the program, the program *must* free it before the program
> > > > > exits. This is enforced by the verifier.
> > > > >
> > > > > The added APIs are:
> > > > >
> > > > > long bpf_dynptr_from_mem(void *data, u32 size, struct bpf_dynptr *ptr);
> > > > > long bpf_malloc(u32 size, struct bpf_dynptr *ptr);
> > > > > void bpf_free(struct bpf_dynptr *ptr);
> > > > >
> > > > > This patch sets up the verifier to support dynptrs. Dynptrs will always
> > > > > reside on the program's stack frame. As such, their state is tracked
> > > > > in their corresponding stack slot, which includes the type of dynptr
> > > > > (DYNPTR_LOCAL vs. DYNPTR_MALLOC).
> > > > >
> > > > > When the program passes in an uninitialized dynptr (ARG_PTR_TO_DYNPTR |
> > > > > MEM_UNINIT), the stack slots corresponding to the frame pointer
> > > > > where the dynptr resides at is marked as STACK_DYNPTR. For helper functions
> > > > > that take in iniitalized dynptrs (such as the next patch in this series
> > > > > which supports dynptr reads/writes), the verifier enforces that the
> > > > > dynptr has been initialized by checking that their corresponding stack
> > > > > slots have been marked as STACK_DYNPTR. Dynptr release functions
> > > > > (eg bpf_free) will clear the stack slots. The verifier enforces at program
> > > > > exit that there are no dynptr stack slots that need to be released.
> > > > >
> > > > > There are other constraints that are enforced by the verifier as
> > > > > well, such as that the dynptr cannot be written to directly by the bpf
> > > > > program or by non-dynptr helper functions. The last patch in this series
> > > > > contains tests that trigger different cases that the verifier needs to
> > > > > successfully reject.
> > > > >
> > > > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > > > > ---
> > > > >  include/linux/bpf.h            |  74 ++++++++-
> > > > >  include/linux/bpf_verifier.h   |  18 +++
> > > > >  include/uapi/linux/bpf.h       |  40 +++++
> > > > >  kernel/bpf/helpers.c           |  88 +++++++++++
> > > > >  kernel/bpf/verifier.c          | 266 ++++++++++++++++++++++++++++++++-
> > > > >  scripts/bpf_doc.py             |   2 +
> > > > >  tools/include/uapi/linux/bpf.h |  40 +++++
> > > > >  7 files changed, 521 insertions(+), 7 deletions(-)
> > > > >

[...]

> > > > > + *     Description
> > > > > + *             Dynamically allocate memory of *size* bytes.
> > > > > + *
> > > > > + *             Every call to bpf_malloc must have a corresponding
> > > > > + *             bpf_free, regardless of whether the bpf_malloc
> > > > > + *             succeeded.
> > > > > + *
> > > > > + *             The maximum *size* supported is DYNPTR_MAX_SIZE.
> > > > > + *     Return
> > > > > + *             0 on success, -ENOMEM if there is not enough memory for the
> > > > > + *             allocation, -EINVAL if the size is 0 or exceeds DYNPTR_MAX_SIZE.
> > > > > + *
> > > > > + * void bpf_free(struct bpf_dynptr *ptr)
> > > >
> > > > thinking about the next patch set that will add storing this malloc
> > > > dynptr into the map, bpf_free() will be a lie, right? As it will only
> > > > decrement a refcnt, not necessarily free it, right? So maybe just
> > > > generic bpf_dynptr_put() or bpf_malloc_put() or something like that is
> > > > a bit more "truthful"?
> > > I like the simplicity of bpf_free(), but I can see how that might be
> > > confusing. What are your thoughts on "bpf_dynptr_free()"? Since when
> > > we get into dynptrs that are stored in maps vs. dynptrs stored
> > > locally, calling bpf_dynptr_free() frees (invalidates) your local
> > > dynptr even if it doesn't free the underlying memory if it still has
> > > valid refcounts on it? To me, "malloc" and "_free" go more intuitively
> > > together as a pair.
> >
> > Sounds good to me (though let's use _dynptr() as a suffix
> > consistently). I also just realized that maybe we should call
> > bpf_malloc() a bpf_malloc_dynptr() instead. I can see how we might
> > want to enable plain bpf_malloc() with statically known size (similar
> > to statically known bpf_ringbuf_reserve()) for temporary local malloc
> > with direct memory access? So bpf_malloc_dynptr() would be a
> > dynptr-enabled counterpart to fixed-sized bpf_malloc()? And then
> > bpf_free() will work with direct pointer returned from bpf_malloc(),
> > while bpf_free_dynptr() will work with dynptr returned from
> > bpf_malloc_dynptr().
> I see! What is the advantage of a plain bpf_malloc()? Is it that it's
> a more ergonomic API (you get back a direct pointer to the data
> instead of getting back a dynptr and then having to call
> bpf_dynptr_data to get direct access) and you don't have to allocate
> extra bytes for refcounting?
>

That, but also I was thinking it would be good to have a simple way
for temporary (active for the duration of a single BPF program run)
buffer, instead of having to rely on per-CPU array, that would work
even in the presence of CPU migrations (per-cpu array won't) for
sleepable BPF programs. But then I recalled that we disable migrations
for sleepable, so there are not many real advantages of such form of
malloc/free. So please disregard.

> I will rename this to bpf_malloc_dynptr() and bpf_free_dynptr().
> >
> > > >
> > > > > + *     Description
> > > > > + *             Free memory allocated by bpf_malloc.
> > > > > + *
> > > > > + *             After this operation, *ptr* will be an invalidated dynptr.
> > > > > + *     Return
> > > > > + *             Void.
> > > > >   */

[...]

> > > > > @@ -5572,6 +5758,40 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
> > > > >                 bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
> > > > >
> > > > >                 err = check_mem_size_reg(env, reg, regno, zero_size_allowed, meta);
> > > > > +       } else if (arg_type_is_dynptr(arg_type)) {
> > > > > +               bool initialized = check_dynptr_init(env, reg, arg_type);
> > > > > +
> > > > > +               if (type_is_uninit_mem(arg_type)) {
> > > > > +                       if (initialized) {
> > > > > +                               verbose(env, "Arg #%d dynptr cannot be an initialized dynptr\n",
> > > > > +                                       arg + 1);
> > > > > +                               return -EINVAL;
> > > > > +                       }
> > > > > +                       meta->raw_mode = true;
> > > > > +                       err = check_helper_mem_access(env, regno, BPF_DYNPTR_SIZE, false, meta);
> > > > > +                       /* For now, we do not allow dynptrs to point to existing
> > > > > +                        * refcounted memory
> > > > > +                        */
> > > > > +                       if (reg_type_may_be_refcounted_or_null(regs[BPF_REG_1].type)) {
> > > >
> > > > hard-coded BPF_REG_1?
> > >
> > > I'm viewing this as a temporary line because one of the patches in a
> > > later dynptr patchset will enable support for local dynptrs to point
> > > to existing refcounted memory. The alternative is to add a new
> > > bpf_type_flag like NO_REFCOUNT and then remove that flag later. What
> > > are your thoughts?
> >
> > my concern and confusion was that it's a hard-coded BPF_REG_1 instead
> > of using arg to derive register itself. Why making unnecessary
> > assumptions about this always being a first argument?
> I think otherwise we need to add a temporary bpf_type_flag that
> specifies that an arg cannot be refcounted, and then when we allow
> local dynptrs to point to refcounted memory later on, we'll need to
> remove this flag and the verifier checks associated with it. But
> overall, I agree with you that we should just add this bpf_type_flag
> to this patchset rather than using BPF_REG_1 as a temporary solution -
> I will make this change for v2!

Ok, I'll wait for v2 as I still can't understand this bit, tbh.

> >
> > > >
> > > > > +                               verbose(env, "Arg #%d dynptr memory cannot be potentially refcounted\n",
> > > > > +                                       arg + 1);
> > > > > +                               return -EINVAL;
> > > > > +                       }
> > > > > +               } else {
> > > > > +                       if (!initialized) {
> > > > > +                               char *err_extra = "";
> > > >

[...]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v1 3/7] bpf: Add bpf_dynptr_from_mem, bpf_malloc, bpf_free
  2022-04-09  1:11           ` Alexei Starovoitov
@ 2022-04-12  2:12             ` Andrii Nakryiko
  2022-04-15  1:43               ` Joanne Koong
  0 siblings, 1 reply; 32+ messages in thread
From: Andrii Nakryiko @ 2022-04-12  2:12 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Joanne Koong, Joanne Koong, bpf, Andrii Nakryiko,
	Alexei Starovoitov, Daniel Borkmann

On Fri, Apr 8, 2022 at 6:12 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Apr 08, 2022 at 04:37:02PM -0700, Joanne Koong wrote:
> > > > > > + *
> > > > > > + * void bpf_free(struct bpf_dynptr *ptr)
> > > > >
> > > > > thinking about the next patch set that will add storing this malloc
> > > > > dynptr into the map, bpf_free() will be a lie, right? As it will only
> > > > > decrement a refcnt, not necessarily free it, right? So maybe just
> > > > > generic bpf_dynptr_put() or bpf_malloc_put() or something like that is
> > > > > a bit more "truthful"?
> > > > I like the simplicity of bpf_free(), but I can see how that might be
> > > > confusing. What are your thoughts on "bpf_dynptr_free()"? Since when
> > > > we get into dynptrs that are stored in maps vs. dynptrs stored
> > > > locally, calling bpf_dynptr_free() frees (invalidates) your local
> > > > dynptr even if it doesn't free the underlying memory if it still has
> > > > valid refcounts on it? To me, "malloc" and "_free" go more intuitively
> > > > together as a pair.
> > >
> > > Sounds good to me (though let's use _dynptr() as a suffix
> > > consistently). I also just realized that maybe we should call
> > > bpf_malloc() a bpf_malloc_dynptr() instead. I can see how we might
> > > want to enable plain bpf_malloc() with statically known size (similar
> > > to statically known bpf_ringbuf_reserve()) for temporary local malloc
> > > with direct memory access? So bpf_malloc_dynptr() would be a
> > > dynptr-enabled counterpart to fixed-sized bpf_malloc()? And then
> > > bpf_free() will work with direct pointer returned from bpf_malloc(),
> > > while bpf_free_dynptr() will work with dynptr returned from
> > > bpf_malloc_dynptr().
> > I see! What is the advantage of a plain bpf_malloc()? Is it that it's
> > a more ergonomic API (you get back a direct pointer to the data
> > instead of getting back a dynptr and then having to call
> > bpf_dynptr_data to get direct access) and you don't have to allocate
> > extra bytes for refcounting?
> >
> > I will rename this to bpf_malloc_dynptr() and bpf_free_dynptr().
>
> Let's make it consistent with kptr. Those helpers will be:
> bpf_kptr_alloc(btf_id, flags, &ptr)
> bpf_kptr_get
> bpf_kptr_put
>
> bpf_dynptr_alloc(byte_size, flags, &dynptr);

I don't have strong feelings about this naming, but
bpf_ringbuf_reserve_dynptr() is a bit of counter-example with a
convention of using "_dynptr" suffix for variations of API that
*produce* dynptrs as an output. bpf_dynptr_alloc() sounds like we are
allocating struct bpf_dynptr itself, not a memory to which bpf_dynptr
points. But I'm don't have perfect naming scheme.

> bpf_dynptr_put(dynptr);
> would fit the best.
>
> Output arg being first doesn't match anything we had.
> let's keep it last.

yep, agree

>
> zero-alloc or plain kmalloc can be indicated by the flag.
> kzalloc() in the kernel is just static inline that adds __GFP_ZERO to flags.
> We don't need bpf_dynptr_alloc and bpf_dynptr_zalloc as two helpers.
> The latter can be a static inline helper in a bpf program.

yeah, sure, my point was that zero-initialization is a better default

>
> Similar to Andrii's concern I feel that bpf_dynptr_free() would be misleading.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v1 3/7] bpf: Add bpf_dynptr_from_mem, bpf_malloc, bpf_free
  2022-04-12  2:12             ` Andrii Nakryiko
@ 2022-04-15  1:43               ` Joanne Koong
  0 siblings, 0 replies; 32+ messages in thread
From: Joanne Koong @ 2022-04-15  1:43 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Joanne Koong, bpf, Andrii Nakryiko,
	Alexei Starovoitov, Daniel Borkmann

On Mon, Apr 11, 2022 at 7:12 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Fri, Apr 8, 2022 at 6:12 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Fri, Apr 08, 2022 at 04:37:02PM -0700, Joanne Koong wrote:
> > > > > > > + *
> > > > > > > + * void bpf_free(struct bpf_dynptr *ptr)
> > > > > >
> > > > > > thinking about the next patch set that will add storing this malloc
> > > > > > dynptr into the map, bpf_free() will be a lie, right? As it will only
> > > > > > decrement a refcnt, not necessarily free it, right? So maybe just
> > > > > > generic bpf_dynptr_put() or bpf_malloc_put() or something like that is
> > > > > > a bit more "truthful"?
> > > > > I like the simplicity of bpf_free(), but I can see how that might be
> > > > > confusing. What are your thoughts on "bpf_dynptr_free()"? Since when
> > > > > we get into dynptrs that are stored in maps vs. dynptrs stored
> > > > > locally, calling bpf_dynptr_free() frees (invalidates) your local
> > > > > dynptr even if it doesn't free the underlying memory if it still has
> > > > > valid refcounts on it? To me, "malloc" and "_free" go more intuitively
> > > > > together as a pair.
> > > >
> > > > Sounds good to me (though let's use _dynptr() as a suffix
> > > > consistently). I also just realized that maybe we should call
> > > > bpf_malloc() a bpf_malloc_dynptr() instead. I can see how we might
> > > > want to enable plain bpf_malloc() with statically known size (similar
> > > > to statically known bpf_ringbuf_reserve()) for temporary local malloc
> > > > with direct memory access? So bpf_malloc_dynptr() would be a
> > > > dynptr-enabled counterpart to fixed-sized bpf_malloc()? And then
> > > > bpf_free() will work with direct pointer returned from bpf_malloc(),
> > > > while bpf_free_dynptr() will work with dynptr returned from
> > > > bpf_malloc_dynptr().
> > > I see! What is the advantage of a plain bpf_malloc()? Is it that it's
> > > a more ergonomic API (you get back a direct pointer to the data
> > > instead of getting back a dynptr and then having to call
> > > bpf_dynptr_data to get direct access) and you don't have to allocate
> > > extra bytes for refcounting?
> > >
> > > I will rename this to bpf_malloc_dynptr() and bpf_free_dynptr().
> >
> > Let's make it consistent with kptr. Those helpers will be:
> > bpf_kptr_alloc(btf_id, flags, &ptr)
> > bpf_kptr_get
> > bpf_kptr_put
> >
> > bpf_dynptr_alloc(byte_size, flags, &dynptr);
>
> I don't have strong feelings about this naming, but
> bpf_ringbuf_reserve_dynptr() is a bit of counter-example with a
> convention of using "_dynptr" suffix for variations of API that
> *produce* dynptrs as an output. bpf_dynptr_alloc() sounds like we are
> allocating struct bpf_dynptr itself, not a memory to which bpf_dynptr
> points. But I'm don't have perfect naming scheme.
I agree. bpf_dynptr_alloc() sounds like it allocates the struct dynptr
- I like bpf_dynptr_malloc() more. But I'm fine going with
bpf_dynptr_alloc() if there's a strong preference for that.
>
> > bpf_dynptr_put(dynptr);
> > would fit the best.
> >
> > Output arg being first doesn't match anything we had.
> > let's keep it last.
>
> yep, agree
>
> >
> > zero-alloc or plain kmalloc can be indicated by the flag.
> > kzalloc() in the kernel is just static inline that adds __GFP_ZERO to flags.
> > We don't need bpf_dynptr_alloc and bpf_dynptr_zalloc as two helpers.
> > The latter can be a static inline helper in a bpf program.
>
> yeah, sure, my point was that zero-initialization is a better default
>
> >
> > Similar to Andrii's concern I feel that bpf_dynptr_free() would be misleading.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2022-04-15  1:43 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-02  1:58 [PATCH bpf-next v1 0/7] Dynamic pointers Joanne Koong
2022-04-02  1:58 ` [PATCH bpf-next v1 1/7] bpf: Add MEM_UNINIT as a bpf_type_flag Joanne Koong
2022-04-06 18:33   ` Andrii Nakryiko
2022-04-02  1:58 ` [PATCH bpf-next v1 2/7] bpf: Add MEM_RELEASE " Joanne Koong
2022-04-04  7:34   ` Kumar Kartikeya Dwivedi
2022-04-04 19:04     ` Joanne Koong
2022-04-06 18:42   ` Andrii Nakryiko
2022-04-02  1:58 ` [PATCH bpf-next v1 3/7] bpf: Add bpf_dynptr_from_mem, bpf_malloc, bpf_free Joanne Koong
2022-04-06 22:23   ` Andrii Nakryiko
2022-04-08 22:04     ` Joanne Koong
2022-04-08 22:46       ` Andrii Nakryiko
2022-04-08 23:37         ` Joanne Koong
2022-04-09  1:11           ` Alexei Starovoitov
2022-04-12  2:12             ` Andrii Nakryiko
2022-04-15  1:43               ` Joanne Koong
2022-04-12  2:05           ` Andrii Nakryiko
2022-04-02  1:58 ` [PATCH bpf-next v1 4/7] bpf: Add bpf_dynptr_read and bpf_dynptr_write Joanne Koong
2022-04-02 13:35   ` Toke Høiland-Jørgensen
2022-04-04 20:18     ` Joanne Koong
2022-04-06 22:32   ` Andrii Nakryiko
2022-04-08 23:07     ` Joanne Koong
2022-04-02  1:58 ` [PATCH bpf-next v1 5/7] bpf: Add dynptr data slices Joanne Koong
2022-04-02  1:58 ` [PATCH bpf-next v1 6/7] bpf: Dynptr support for ring buffers Joanne Koong
2022-04-02  6:40   ` kernel test robot
2022-04-06 22:50   ` Andrii Nakryiko
2022-04-02  1:58 ` [PATCH bpf-next v1 7/7] bpf: Dynptr tests Joanne Koong
2022-04-06 23:11   ` Andrii Nakryiko
2022-04-08 23:16     ` Joanne Koong
2022-04-06 23:13 ` [PATCH bpf-next v1 0/7] Dynamic pointers Andrii Nakryiko
2022-04-07 12:44   ` Brendan Jackman
2022-04-07 20:40     ` Joanne Koong
2022-04-08 10:21       ` Brendan Jackman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).