bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/7] Dynamic pointers
@ 2022-04-16  6:34 Joanne Koong
  2022-04-16  6:34 ` [PATCH bpf-next v2 1/7] bpf: Add MEM_UNINIT as a bpf_type_flag Joanne Koong
                   ` (7 more replies)
  0 siblings, 8 replies; 27+ messages in thread
From: Joanne Koong @ 2022-04-16  6:34 UTC (permalink / raw)
  To: bpf; +Cc: andrii, memxor, ast, daniel, toke, Joanne Koong

This patchset implements the basics of dynamic pointers in bpf.

A dynamic pointer (struct bpf_dynptr) is a pointer that stores extra metadata
alongside the address it points to. This abstraction is useful in bpf, given
that every memory access in a bpf program must be safe. The verifier and bpf
helper functions can use the metadata to enforce safety guarantees for things 
such as dynamically sized strings and kernel heap allocations.

From the program side, the bpf_dynptr is an opaque struct and the verifier
will enforce that its contents are never written to by the program.
It can only be written to through specific bpf helper functions.

There are several uses cases for dynamic pointers in bpf programs. A list of
some are: dynamically sized ringbuf reservations without any extra memcpys,
dynamic string parsing and memory comparisons, dynamic memory allocations that
can be persisted in a map, and dynamic parsing of sk_buff and xdp_md packet
data.

At a high-level, the patches are as follows:
1/7 - Adds MEM_UNINIT as a bpf_type_flag
2/7 - Adds MEM_RELEASE as a bpf_type_flag
3/7 - Adds bpf_dynptr_from_mem, bpf_dynptr_alloc, and bpf_dynptr_put
4/7 - Adds bpf_dynptr_read and bpf_dynptr_write
5/7 - Adds dynptr data slices (ptr to underlying dynptr memory)
6/7 - Adds dynptr support for ring buffers
7/7 - Tests to check that verifier rejects certain fail cases and passes
certain success cases

This is the first dynptr patchset in a larger series. The next series of
patches will add persisting dynamic memory allocations in maps, parsing packet
data through dynptrs, dynptrs to referenced objects, convenience helpers for
using dynptrs as iterators, and more helper functions for interacting with
strings and memory dynamically.

Changelog:
----------
v1 -> v2:
v1: https://lore.kernel.org/bpf/20220402015826.3941317-1-joannekoong@fb.com/

1/7 -
    * Remove ARG_PTR_TO_MAP_VALUE_UNINIT alias and use
      ARG_PTR_TO_MAP_VALUE | MEM_UNINIT directly (Andrii)
    * Drop arg_type_is_mem_ptr() wrapper function (Andrii)

2/7 - 
    * Change name from MEM_RELEASE to OBJ_RELEASE (Andrii)
    * Use meta.release_ref instead of ref_obj_id != 0 to determine whether
      to release reference (Kumar)
    * Drop type_is_release_mem() wrapper function (Andrii) 
	
3/7 -
    * Add checks for nested dynptrs edge-cases, which could lead to corrupt
    * writes of the dynptr stack variable.
    * Add u64 flags to bpf_dynptr_from_mem() and bpf_dynptr_alloc() (Andrii)
    * Rename from bpf_malloc/bpf_free to bpf_dynptr_alloc/bpf_dynptr_put
      (Alexei)
    * Support alloc flag __GFP_ZERO (Andrii) 
    * Reserve upper 8 bits in dynptr size and offset fields instead of
      reserving just the upper 4 bits (Andrii)
    * Allow dynptr zero-slices (Andrii) 
    * Use the highest bit for is_rdonly instead of the 28th bit (Andrii)
    * Rename check_* functions to is_* functions for better readability
      (Andrii)
    * Add comment for code that checks the spi bounds (Andrii)

4/7 -
    * Fix doc description for bpf_dynpt_read (Toke)
    * Move bpf_dynptr_check_off_len() from function patch 1 to here (Andrii)

5/7 -
    * When finding the id for the dynptr to associate the data slice with,
      look for dynptr arg instead of assuming it is BPF_REG_1.

6/7 -
    * Add __force when casting from unsigned long to void * (kernel test robot)
    * Expand on docs for ringbuf dynptr APIs (Andrii)

7/7 -
    * Use table approach for defining test programs and error messages (Andrii)
    * Print out full log if there’s an error (Andrii)
    * Use bpf_object__find_program_by_name() instead of specifying
      program name as a string (Andrii)
    * Add 6 extra cases: invalid_nested_dynptrs1, invalid_nested_dynptrs2,
      invalid_ref_mem1, invalid_ref_mem2, zero_slice_access,
      and test_alloc_zero_bytes
    * Add checking for edge cases (eg allocing with invalid flags)

Joanne Koong (7):
  bpf: Add MEM_UNINIT as a bpf_type_flag
  bpf: Add OBJ_RELEASE as a bpf_type_flag
  bpf: Add bpf_dynptr_from_mem, bpf_dynptr_alloc, bpf_dynptr_put
  bpf: Add bpf_dynptr_read and bpf_dynptr_write
  bpf: Add dynptr data slices
  bpf: Dynptr support for ring buffers
  bpf: Dynptr tests

 include/linux/bpf.h                           | 109 ++-
 include/linux/bpf_verifier.h                  |  33 +-
 include/uapi/linux/bpf.h                      | 110 +++
 kernel/bpf/bpf_lsm.c                          |   4 +-
 kernel/bpf/btf.c                              |   3 +-
 kernel/bpf/cgroup.c                           |   4 +-
 kernel/bpf/helpers.c                          | 212 +++++-
 kernel/bpf/ringbuf.c                          |  75 +-
 kernel/bpf/stackmap.c                         |   6 +-
 kernel/bpf/verifier.c                         | 538 +++++++++++++--
 kernel/trace/bpf_trace.c                      |  30 +-
 net/core/filter.c                             |  28 +-
 scripts/bpf_doc.py                            |   2 +
 tools/include/uapi/linux/bpf.h                | 110 +++
 .../testing/selftests/bpf/prog_tests/dynptr.c | 138 ++++
 .../testing/selftests/bpf/progs/dynptr_fail.c | 643 ++++++++++++++++++
 .../selftests/bpf/progs/dynptr_success.c      | 217 ++++++
 17 files changed, 2148 insertions(+), 114 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/dynptr.c
 create mode 100644 tools/testing/selftests/bpf/progs/dynptr_fail.c
 create mode 100644 tools/testing/selftests/bpf/progs/dynptr_success.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v2 1/7] bpf: Add MEM_UNINIT as a bpf_type_flag
  2022-04-16  6:34 [PATCH bpf-next v2 0/7] Dynamic pointers Joanne Koong
@ 2022-04-16  6:34 ` Joanne Koong
  2022-04-19  4:59   ` Alexei Starovoitov
  2022-04-16  6:34 ` [PATCH bpf-next v2 2/7] bpf: Add OBJ_RELEASE " Joanne Koong
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Joanne Koong @ 2022-04-16  6:34 UTC (permalink / raw)
  To: bpf; +Cc: andrii, memxor, ast, daniel, toke, Joanne Koong

Instead of having uninitialized versions of arguments as separate
bpf_arg_types (eg ARG_PTR_TO_UNINIT_MEM as the uninitialized version
of ARG_PTR_TO_MEM), we can instead use MEM_UNINIT as a bpf_type_flag
modifier to denote that the argument is uninitialized.

Doing so cleans up some of the logic in the verifier. We no longer
need to do two checks against an argument type (eg "if
(base_type(arg_type) == ARG_PTR_TO_MEM || base_type(arg_type) ==
ARG_PTR_TO_UNINIT_MEM)"), since uninitialized and initialized
versions of the same argument type will now share the same base type.

In the near future, MEM_UNINIT will be used by dynptr helper functions
as well.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/linux/bpf.h      | 17 +++++++++--------
 kernel/bpf/bpf_lsm.c     |  4 ++--
 kernel/bpf/cgroup.c      |  4 ++--
 kernel/bpf/helpers.c     | 12 ++++++------
 kernel/bpf/stackmap.c    |  6 +++---
 kernel/bpf/verifier.c    | 36 +++++++++++++-----------------------
 kernel/trace/bpf_trace.c | 30 +++++++++++++++---------------
 net/core/filter.c        | 26 +++++++++++++-------------
 8 files changed, 63 insertions(+), 72 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index bdb5298735ce..12b90de9c46d 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -342,7 +342,9 @@ enum bpf_type_flag {
 	 */
 	MEM_PERCPU		= BIT(4 + BPF_BASE_TYPE_BITS),
 
-	__BPF_TYPE_LAST_FLAG	= MEM_PERCPU,
+	MEM_UNINIT		= BIT(5 + BPF_BASE_TYPE_BITS),
+
+	__BPF_TYPE_LAST_FLAG	= MEM_UNINIT,
 };
 
 /* Max number of base types. */
@@ -361,16 +363,11 @@ enum bpf_arg_type {
 	ARG_CONST_MAP_PTR,	/* const argument used as pointer to bpf_map */
 	ARG_PTR_TO_MAP_KEY,	/* pointer to stack used as map key */
 	ARG_PTR_TO_MAP_VALUE,	/* pointer to stack used as map value */
-	ARG_PTR_TO_UNINIT_MAP_VALUE,	/* pointer to valid memory used to store a map value */
 
-	/* the following constraints used to prototype bpf_memcmp() and other
-	 * functions that access data on eBPF program stack
+	/* Used to prototype bpf_memcmp() and other functions that access data
+	 * on eBPF program stack
 	 */
 	ARG_PTR_TO_MEM,		/* pointer to valid memory (stack, packet, map value) */
-	ARG_PTR_TO_UNINIT_MEM,	/* pointer to memory does not need to be initialized,
-				 * helper function must fill all bytes or clear
-				 * them in error case.
-				 */
 
 	ARG_CONST_SIZE,		/* number of bytes accessed from memory */
 	ARG_CONST_SIZE_OR_ZERO,	/* number of bytes accessed from memory or 0 */
@@ -400,6 +397,10 @@ enum bpf_arg_type {
 	ARG_PTR_TO_SOCKET_OR_NULL	= PTR_MAYBE_NULL | ARG_PTR_TO_SOCKET,
 	ARG_PTR_TO_ALLOC_MEM_OR_NULL	= PTR_MAYBE_NULL | ARG_PTR_TO_ALLOC_MEM,
 	ARG_PTR_TO_STACK_OR_NULL	= PTR_MAYBE_NULL | ARG_PTR_TO_STACK,
+	/* pointer to memory does not need to be initialized, helper function must fill
+	 * all bytes or clear them in error case.
+	 */
+	ARG_PTR_TO_MEM_UNINIT		= MEM_UNINIT | ARG_PTR_TO_MEM,
 
 	/* This must be the last entry. Its purpose is to ensure the enum is
 	 * wide enough to hold the higher bits reserved for bpf_type_flag.
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index 064eccba641d..11ebadc82e8d 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -94,7 +94,7 @@ static const struct bpf_func_proto bpf_ima_inode_hash_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_BTF_ID,
 	.arg1_btf_id	= &bpf_ima_inode_hash_btf_ids[0],
-	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg3_type	= ARG_CONST_SIZE,
 	.allowed	= bpf_ima_inode_hash_allowed,
 };
@@ -112,7 +112,7 @@ static const struct bpf_func_proto bpf_ima_file_hash_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_BTF_ID,
 	.arg1_btf_id	= &bpf_ima_file_hash_btf_ids[0],
-	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg3_type	= ARG_CONST_SIZE,
 	.allowed	= bpf_ima_inode_hash_allowed,
 };
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 128028efda64..4947e3324480 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1724,7 +1724,7 @@ static const struct bpf_func_proto bpf_sysctl_get_current_value_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg3_type	= ARG_CONST_SIZE,
 };
 
@@ -1744,7 +1744,7 @@ static const struct bpf_func_proto bpf_sysctl_get_new_value_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg3_type	= ARG_CONST_SIZE,
 };
 
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 315053ef6a75..a47aae5c7335 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -103,7 +103,7 @@ const struct bpf_func_proto bpf_map_pop_elem_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_CONST_MAP_PTR,
-	.arg2_type	= ARG_PTR_TO_UNINIT_MAP_VALUE,
+	.arg2_type	= ARG_PTR_TO_MAP_VALUE | MEM_UNINIT,
 };
 
 BPF_CALL_2(bpf_map_peek_elem, struct bpf_map *, map, void *, value)
@@ -116,7 +116,7 @@ const struct bpf_func_proto bpf_map_peek_elem_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_CONST_MAP_PTR,
-	.arg2_type	= ARG_PTR_TO_UNINIT_MAP_VALUE,
+	.arg2_type	= ARG_PTR_TO_MAP_VALUE | MEM_UNINIT,
 };
 
 const struct bpf_func_proto bpf_get_prandom_u32_proto = {
@@ -237,7 +237,7 @@ const struct bpf_func_proto bpf_get_current_comm_proto = {
 	.func		= bpf_get_current_comm,
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg1_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg2_type	= ARG_CONST_SIZE,
 };
 
@@ -616,7 +616,7 @@ const struct bpf_func_proto bpf_get_ns_current_pid_tgid_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_ANYTHING,
 	.arg2_type	= ARG_ANYTHING,
-	.arg3_type      = ARG_PTR_TO_UNINIT_MEM,
+	.arg3_type      = ARG_PTR_TO_MEM_UNINIT,
 	.arg4_type      = ARG_CONST_SIZE,
 };
 
@@ -663,7 +663,7 @@ const struct bpf_func_proto bpf_copy_from_user_proto = {
 	.func		= bpf_copy_from_user,
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg1_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
 	.arg3_type	= ARG_ANYTHING,
 };
@@ -693,7 +693,7 @@ const struct bpf_func_proto bpf_copy_from_user_task_proto = {
 	.func		= bpf_copy_from_user_task,
 	.gpl_only	= true,
 	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg1_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
 	.arg3_type	= ARG_ANYTHING,
 	.arg4_type	= ARG_PTR_TO_BTF_ID,
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 34725bfa1e97..24fdda340008 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -465,7 +465,7 @@ const struct bpf_func_proto bpf_get_stack_proto = {
 	.gpl_only	= true,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg3_type	= ARG_CONST_SIZE_OR_ZERO,
 	.arg4_type	= ARG_ANYTHING,
 };
@@ -493,7 +493,7 @@ const struct bpf_func_proto bpf_get_task_stack_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_BTF_ID,
 	.arg1_btf_id	= &btf_tracing_ids[BTF_TRACING_TYPE_TASK],
-	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg3_type	= ARG_CONST_SIZE_OR_ZERO,
 	.arg4_type	= ARG_ANYTHING,
 };
@@ -556,7 +556,7 @@ const struct bpf_func_proto bpf_get_stack_proto_pe = {
 	.gpl_only	= true,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg3_type	= ARG_CONST_SIZE_OR_ZERO,
 	.arg4_type	= ARG_ANYTHING,
 };
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d175b70067b3..355566979e36 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5134,12 +5134,6 @@ static int process_timer_func(struct bpf_verifier_env *env, int regno,
 	return 0;
 }
 
-static bool arg_type_is_mem_ptr(enum bpf_arg_type type)
-{
-	return base_type(type) == ARG_PTR_TO_MEM ||
-	       base_type(type) == ARG_PTR_TO_UNINIT_MEM;
-}
-
 static bool arg_type_is_mem_size(enum bpf_arg_type type)
 {
 	return type == ARG_CONST_SIZE ||
@@ -5273,7 +5267,6 @@ static const struct bpf_reg_types timer_types = { .types = { PTR_TO_MAP_VALUE }
 static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
 	[ARG_PTR_TO_MAP_KEY]		= &map_key_value_types,
 	[ARG_PTR_TO_MAP_VALUE]		= &map_key_value_types,
-	[ARG_PTR_TO_UNINIT_MAP_VALUE]	= &map_key_value_types,
 	[ARG_CONST_SIZE]		= &scalar_types,
 	[ARG_CONST_SIZE_OR_ZERO]	= &scalar_types,
 	[ARG_CONST_ALLOC_SIZE_OR_ZERO]	= &scalar_types,
@@ -5287,7 +5280,6 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
 	[ARG_PTR_TO_BTF_ID]		= &btf_ptr_types,
 	[ARG_PTR_TO_SPIN_LOCK]		= &spin_lock_types,
 	[ARG_PTR_TO_MEM]		= &mem_types,
-	[ARG_PTR_TO_UNINIT_MEM]		= &mem_types,
 	[ARG_PTR_TO_ALLOC_MEM]		= &alloc_mem_types,
 	[ARG_PTR_TO_INT]		= &int_ptr_types,
 	[ARG_PTR_TO_LONG]		= &int_ptr_types,
@@ -5451,8 +5443,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		return -EACCES;
 	}
 
-	if (base_type(arg_type) == ARG_PTR_TO_MAP_VALUE ||
-	    base_type(arg_type) == ARG_PTR_TO_UNINIT_MAP_VALUE) {
+	if (base_type(arg_type) == ARG_PTR_TO_MAP_VALUE) {
 		err = resolve_map_arg_type(env, meta, &arg_type);
 		if (err)
 			return err;
@@ -5528,8 +5519,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		err = check_helper_mem_access(env, regno,
 					      meta->map_ptr->key_size, false,
 					      NULL);
-	} else if (base_type(arg_type) == ARG_PTR_TO_MAP_VALUE ||
-		   base_type(arg_type) == ARG_PTR_TO_UNINIT_MAP_VALUE) {
+	} else if (base_type(arg_type) == ARG_PTR_TO_MAP_VALUE) {
 		if (type_may_be_null(arg_type) && register_is_null(reg))
 			return 0;
 
@@ -5541,7 +5531,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			verbose(env, "invalid map_ptr to access map->value\n");
 			return -EACCES;
 		}
-		meta->raw_mode = (arg_type == ARG_PTR_TO_UNINIT_MAP_VALUE);
+		meta->raw_mode = arg_type & MEM_UNINIT;
 		err = check_helper_mem_access(env, regno,
 					      meta->map_ptr->value_size, false,
 					      meta);
@@ -5568,11 +5558,11 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			return -EACCES;
 	} else if (arg_type == ARG_PTR_TO_FUNC) {
 		meta->subprogno = reg->subprogno;
-	} else if (arg_type_is_mem_ptr(arg_type)) {
+	} else if (base_type(arg_type) == ARG_PTR_TO_MEM) {
 		/* The access to this pointer is only checked when we hit the
 		 * next is_mem_size argument below.
 		 */
-		meta->raw_mode = (arg_type == ARG_PTR_TO_UNINIT_MEM);
+		meta->raw_mode = arg_type & MEM_UNINIT;
 	} else if (arg_type_is_mem_size(arg_type)) {
 		bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
 
@@ -5894,15 +5884,15 @@ static bool check_raw_mode_ok(const struct bpf_func_proto *fn)
 {
 	int count = 0;
 
-	if (fn->arg1_type == ARG_PTR_TO_UNINIT_MEM)
+	if (fn->arg1_type == ARG_PTR_TO_MEM_UNINIT)
 		count++;
-	if (fn->arg2_type == ARG_PTR_TO_UNINIT_MEM)
+	if (fn->arg2_type == ARG_PTR_TO_MEM_UNINIT)
 		count++;
-	if (fn->arg3_type == ARG_PTR_TO_UNINIT_MEM)
+	if (fn->arg3_type == ARG_PTR_TO_MEM_UNINIT)
 		count++;
-	if (fn->arg4_type == ARG_PTR_TO_UNINIT_MEM)
+	if (fn->arg4_type == ARG_PTR_TO_MEM_UNINIT)
 		count++;
-	if (fn->arg5_type == ARG_PTR_TO_UNINIT_MEM)
+	if (fn->arg5_type == ARG_PTR_TO_MEM_UNINIT)
 		count++;
 
 	/* We only support one arg being in raw mode at the moment,
@@ -5915,9 +5905,9 @@ static bool check_raw_mode_ok(const struct bpf_func_proto *fn)
 static bool check_args_pair_invalid(enum bpf_arg_type arg_curr,
 				    enum bpf_arg_type arg_next)
 {
-	return (arg_type_is_mem_ptr(arg_curr) &&
+	return (base_type(arg_curr) == ARG_PTR_TO_MEM &&
 	        !arg_type_is_mem_size(arg_next)) ||
-	       (!arg_type_is_mem_ptr(arg_curr) &&
+	       (base_type(arg_curr) != ARG_PTR_TO_MEM &&
 		arg_type_is_mem_size(arg_next));
 }
 
@@ -5929,7 +5919,7 @@ static bool check_arg_pair_ok(const struct bpf_func_proto *fn)
 	 * helper function specification.
 	 */
 	if (arg_type_is_mem_size(fn->arg1_type) ||
-	    arg_type_is_mem_ptr(fn->arg5_type)  ||
+	    base_type(fn->arg5_type) == ARG_PTR_TO_MEM ||
 	    check_args_pair_invalid(fn->arg1_type, fn->arg2_type) ||
 	    check_args_pair_invalid(fn->arg2_type, fn->arg3_type) ||
 	    check_args_pair_invalid(fn->arg3_type, fn->arg4_type) ||
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 7fa2ebc07f60..36fe08097b3b 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -175,7 +175,7 @@ const struct bpf_func_proto bpf_probe_read_user_proto = {
 	.func		= bpf_probe_read_user,
 	.gpl_only	= true,
 	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg1_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
 	.arg3_type	= ARG_ANYTHING,
 };
@@ -212,7 +212,7 @@ const struct bpf_func_proto bpf_probe_read_user_str_proto = {
 	.func		= bpf_probe_read_user_str,
 	.gpl_only	= true,
 	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg1_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
 	.arg3_type	= ARG_ANYTHING,
 };
@@ -238,7 +238,7 @@ const struct bpf_func_proto bpf_probe_read_kernel_proto = {
 	.func		= bpf_probe_read_kernel,
 	.gpl_only	= true,
 	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg1_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
 	.arg3_type	= ARG_ANYTHING,
 };
@@ -273,7 +273,7 @@ const struct bpf_func_proto bpf_probe_read_kernel_str_proto = {
 	.func		= bpf_probe_read_kernel_str,
 	.gpl_only	= true,
 	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg1_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
 	.arg3_type	= ARG_ANYTHING,
 };
@@ -293,7 +293,7 @@ static const struct bpf_func_proto bpf_probe_read_compat_proto = {
 	.func		= bpf_probe_read_compat,
 	.gpl_only	= true,
 	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg1_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
 	.arg3_type	= ARG_ANYTHING,
 };
@@ -312,7 +312,7 @@ static const struct bpf_func_proto bpf_probe_read_compat_str_proto = {
 	.func		= bpf_probe_read_compat_str,
 	.gpl_only	= true,
 	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg1_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
 	.arg3_type	= ARG_ANYTHING,
 };
@@ -610,7 +610,7 @@ static const struct bpf_func_proto bpf_perf_event_read_value_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_CONST_MAP_PTR,
 	.arg2_type	= ARG_ANYTHING,
-	.arg3_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg3_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg4_type	= ARG_CONST_SIZE,
 };
 
@@ -1112,7 +1112,7 @@ static const struct bpf_func_proto bpf_get_branch_snapshot_proto = {
 	.func		= bpf_get_branch_snapshot,
 	.gpl_only	= true,
 	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg1_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
 };
 
@@ -1406,7 +1406,7 @@ static const struct bpf_func_proto bpf_get_stack_proto_tp = {
 	.gpl_only	= true,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg3_type	= ARG_CONST_SIZE_OR_ZERO,
 	.arg4_type	= ARG_ANYTHING,
 };
@@ -1469,12 +1469,12 @@ BPF_CALL_3(bpf_perf_prog_read_value, struct bpf_perf_event_data_kern *, ctx,
 }
 
 static const struct bpf_func_proto bpf_perf_prog_read_value_proto = {
-         .func           = bpf_perf_prog_read_value,
-         .gpl_only       = true,
-         .ret_type       = RET_INTEGER,
-         .arg1_type      = ARG_PTR_TO_CTX,
-         .arg2_type      = ARG_PTR_TO_UNINIT_MEM,
-         .arg3_type      = ARG_CONST_SIZE,
+	.func           = bpf_perf_prog_read_value,
+	.gpl_only       = true,
+	.ret_type       = RET_INTEGER,
+	.arg1_type      = ARG_PTR_TO_CTX,
+	.arg2_type      = ARG_PTR_TO_MEM_UNINIT,
+	.arg3_type      = ARG_CONST_SIZE,
 };
 
 BPF_CALL_4(bpf_read_branch_records, struct bpf_perf_event_data_kern *, ctx,
diff --git a/net/core/filter.c b/net/core/filter.c
index a7044e98765e..9aafec3a09ed 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1743,7 +1743,7 @@ static const struct bpf_func_proto bpf_skb_load_bytes_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_ANYTHING,
-	.arg3_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg3_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg4_type	= ARG_CONST_SIZE,
 };
 
@@ -1777,7 +1777,7 @@ static const struct bpf_func_proto bpf_flow_dissector_load_bytes_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_ANYTHING,
-	.arg3_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg3_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg4_type	= ARG_CONST_SIZE,
 };
 
@@ -1821,7 +1821,7 @@ static const struct bpf_func_proto bpf_skb_load_bytes_relative_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_ANYTHING,
-	.arg3_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg3_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg4_type	= ARG_CONST_SIZE,
 	.arg5_type	= ARG_ANYTHING,
 };
@@ -3943,7 +3943,7 @@ static const struct bpf_func_proto bpf_xdp_load_bytes_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_ANYTHING,
-	.arg3_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg3_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg4_type	= ARG_CONST_SIZE,
 };
 
@@ -3970,7 +3970,7 @@ static const struct bpf_func_proto bpf_xdp_store_bytes_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_ANYTHING,
-	.arg3_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg3_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg4_type	= ARG_CONST_SIZE,
 };
 
@@ -4544,7 +4544,7 @@ static const struct bpf_func_proto bpf_skb_get_tunnel_key_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg3_type	= ARG_CONST_SIZE,
 	.arg4_type	= ARG_ANYTHING,
 };
@@ -4579,7 +4579,7 @@ static const struct bpf_func_proto bpf_skb_get_tunnel_opt_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg3_type	= ARG_CONST_SIZE,
 };
 
@@ -5386,7 +5386,7 @@ const struct bpf_func_proto bpf_sk_getsockopt_proto = {
 	.arg1_type	= ARG_PTR_TO_BTF_ID_SOCK_COMMON,
 	.arg2_type	= ARG_ANYTHING,
 	.arg3_type	= ARG_ANYTHING,
-	.arg4_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg4_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg5_type	= ARG_CONST_SIZE,
 };
 
@@ -5420,7 +5420,7 @@ static const struct bpf_func_proto bpf_sock_addr_getsockopt_proto = {
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_ANYTHING,
 	.arg3_type	= ARG_ANYTHING,
-	.arg4_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg4_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg5_type	= ARG_CONST_SIZE,
 };
 
@@ -5544,7 +5544,7 @@ static const struct bpf_func_proto bpf_sock_ops_getsockopt_proto = {
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_ANYTHING,
 	.arg3_type	= ARG_ANYTHING,
-	.arg4_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg4_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg5_type	= ARG_CONST_SIZE,
 };
 
@@ -5656,7 +5656,7 @@ static const struct bpf_func_proto bpf_skb_get_xfrm_state_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_ANYTHING,
-	.arg3_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg3_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg4_type	= ARG_CONST_SIZE,
 	.arg5_type	= ARG_ANYTHING,
 };
@@ -10741,7 +10741,7 @@ static const struct bpf_func_proto sk_reuseport_load_bytes_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_ANYTHING,
-	.arg3_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg3_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg4_type	= ARG_CONST_SIZE,
 };
 
@@ -10759,7 +10759,7 @@ static const struct bpf_func_proto sk_reuseport_load_bytes_relative_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_ANYTHING,
-	.arg3_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg3_type	= ARG_PTR_TO_MEM_UNINIT,
 	.arg4_type	= ARG_CONST_SIZE,
 	.arg5_type	= ARG_ANYTHING,
 };
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v2 2/7] bpf: Add OBJ_RELEASE as a bpf_type_flag
  2022-04-16  6:34 [PATCH bpf-next v2 0/7] Dynamic pointers Joanne Koong
  2022-04-16  6:34 ` [PATCH bpf-next v2 1/7] bpf: Add MEM_UNINIT as a bpf_type_flag Joanne Koong
@ 2022-04-16  6:34 ` Joanne Koong
  2022-04-16  6:34 ` [PATCH bpf-next v2 3/7] bpf: Add bpf_dynptr_from_mem, bpf_dynptr_alloc, bpf_dynptr_put Joanne Koong
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Joanne Koong @ 2022-04-16  6:34 UTC (permalink / raw)
  To: bpf; +Cc: andrii, memxor, ast, daniel, toke, Joanne Koong

Currently, we hardcode in the verifier which functions are release
functions. We have no way of differentiating which argument is the one
to be released (we assume it will always be the first argument).

This patch adds OBJ_RELEASE as a bpf_type_flag. This allows us to
determine which argument in the function needs to be released, and
removes having to hardcode a list of release functions into the
verifier.

Please note that currently, we only support one release argument in a
helper function. In the future, if/when we need to support several
release arguments within the function, OBJ_RELEASE is necessary
since there needs to be a way of differentiating which arguments are the
release ones.

In the near future, OBJ_RELEASE will be used by dynptr helper functions
such as bpf_dynptr_put.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/linux/bpf.h          |  4 +++-
 include/linux/bpf_verifier.h |  3 +--
 kernel/bpf/btf.c             |  3 ++-
 kernel/bpf/ringbuf.c         |  4 ++--
 kernel/bpf/verifier.c        | 44 +++++++++++++++++-------------------
 net/core/filter.c            |  2 +-
 6 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 12b90de9c46d..29964cdb1dd6 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -344,7 +344,9 @@ enum bpf_type_flag {
 
 	MEM_UNINIT		= BIT(5 + BPF_BASE_TYPE_BITS),
 
-	__BPF_TYPE_LAST_FLAG	= MEM_UNINIT,
+	OBJ_RELEASE		= BIT(6 + BPF_BASE_TYPE_BITS),
+
+	__BPF_TYPE_LAST_FLAG	= OBJ_RELEASE,
 };
 
 /* Max number of base types. */
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index c1fc4af47f69..7a01adc9e13f 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -523,8 +523,7 @@ int check_ptr_off_reg(struct bpf_verifier_env *env,
 		      const struct bpf_reg_state *reg, int regno);
 int check_func_arg_reg_off(struct bpf_verifier_env *env,
 			   const struct bpf_reg_state *reg, int regno,
-			   enum bpf_arg_type arg_type,
-			   bool is_release_func);
+			   enum bpf_arg_type arg_type, bool arg_release);
 int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
 			     u32 regno);
 int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 0918a39279f6..e5b765a84aec 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -5830,7 +5830,8 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 		ref_t = btf_type_skip_modifiers(btf, t->type, &ref_id);
 		ref_tname = btf_name_by_offset(btf, ref_t->name_off);
 
-		ret = check_func_arg_reg_off(env, reg, regno, ARG_DONTCARE, rel);
+		ret = check_func_arg_reg_off(env, reg, regno, ARG_DONTCARE,
+					     rel && reg->ref_obj_id);
 		if (ret < 0)
 			return ret;
 
diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index 710ba9de12ce..5173fd37590f 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -404,7 +404,7 @@ BPF_CALL_2(bpf_ringbuf_submit, void *, sample, u64, flags)
 const struct bpf_func_proto bpf_ringbuf_submit_proto = {
 	.func		= bpf_ringbuf_submit,
 	.ret_type	= RET_VOID,
-	.arg1_type	= ARG_PTR_TO_ALLOC_MEM,
+	.arg1_type	= ARG_PTR_TO_ALLOC_MEM | OBJ_RELEASE,
 	.arg2_type	= ARG_ANYTHING,
 };
 
@@ -417,7 +417,7 @@ BPF_CALL_2(bpf_ringbuf_discard, void *, sample, u64, flags)
 const struct bpf_func_proto bpf_ringbuf_discard_proto = {
 	.func		= bpf_ringbuf_discard,
 	.ret_type	= RET_VOID,
-	.arg1_type	= ARG_PTR_TO_ALLOC_MEM,
+	.arg1_type	= ARG_PTR_TO_ALLOC_MEM | OBJ_RELEASE,
 	.arg2_type	= ARG_ANYTHING,
 };
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 355566979e36..8deb588a19ce 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -257,6 +257,7 @@ struct bpf_call_arg_meta {
 	struct btf *ret_btf;
 	u32 ret_btf_id;
 	u32 subprogno;
+	bool release_ref;
 };
 
 struct btf *btf_vmlinux;
@@ -471,17 +472,6 @@ static bool type_may_be_null(u32 type)
 	return type & PTR_MAYBE_NULL;
 }
 
-/* Determine whether the function releases some resources allocated by another
- * function call. The first reference type argument will be assumed to be
- * released by release_reference().
- */
-static bool is_release_function(enum bpf_func_id func_id)
-{
-	return func_id == BPF_FUNC_sk_release ||
-	       func_id == BPF_FUNC_ringbuf_submit ||
-	       func_id == BPF_FUNC_ringbuf_discard;
-}
-
 static bool may_be_acquire_function(enum bpf_func_id func_id)
 {
 	return func_id == BPF_FUNC_sk_lookup_tcp ||
@@ -5359,11 +5349,10 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 
 int check_func_arg_reg_off(struct bpf_verifier_env *env,
 			   const struct bpf_reg_state *reg, int regno,
-			   enum bpf_arg_type arg_type,
-			   bool is_release_func)
+			   enum bpf_arg_type arg_type, bool arg_release)
 {
-	bool fixed_off_ok = false, release_reg;
 	enum bpf_reg_type type = reg->type;
+	bool fixed_off_ok = false;
 
 	switch ((u32)type) {
 	case SCALAR_VALUE:
@@ -5388,18 +5377,15 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
 	 * fixed offset.
 	 */
 	case PTR_TO_BTF_ID:
-		/* When referenced PTR_TO_BTF_ID is passed to release function,
-		 * it's fixed offset must be 0. We rely on the property that
-		 * only one referenced register can be passed to BPF helpers and
-		 * kfuncs. In the other cases, fixed offset can be non-zero.
+		/* If a referenced PTR_TO_BTF_ID will be released, its fixed offset
+		 * must be 0.
 		 */
-		release_reg = is_release_func && reg->ref_obj_id;
-		if (release_reg && reg->off) {
+		if (arg_release && reg->off) {
 			verbose(env, "R%d must have zero offset when passed to release func\n",
 				regno);
 			return -EINVAL;
 		}
-		/* For release_reg == true, fixed_off_ok must be false, but we
+		/* For arg_release == true, fixed_off_ok must be false, but we
 		 * already checked and rejected reg->off != 0 above, so set to
 		 * true to allow fixed offset for all other cases.
 		 */
@@ -5459,7 +5445,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 	if (err)
 		return err;
 
-	err = check_func_arg_reg_off(env, reg, regno, arg_type, is_release_function(meta->func_id));
+	err = check_func_arg_reg_off(env, reg, regno, arg_type, arg_type & OBJ_RELEASE);
 	if (err)
 		return err;
 
@@ -5476,6 +5462,18 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		}
 		meta->ref_obj_id = reg->ref_obj_id;
 	}
+	if (arg_type & OBJ_RELEASE) {
+		if (!reg->ref_obj_id) {
+			verbose(env, "arg %d is an unacquired reference\n", regno);
+			return -EINVAL;
+		}
+		if (meta->release_ref) {
+			verbose(env, "verifier internal error: more than one release_ref arg R%d\n",
+				regno);
+			return -EFAULT;
+		}
+		meta->release_ref = true;
+	}
 
 	if (arg_type == ARG_CONST_MAP_PTR) {
 		/* bpf_map_xxx(map_ptr) call: remember that map_ptr */
@@ -6688,7 +6686,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			return err;
 	}
 
-	if (is_release_function(func_id)) {
+	if (meta.release_ref) {
 		err = release_reference(env, meta.ref_obj_id);
 		if (err) {
 			verbose(env, "func %s#%d reference has not been acquired before\n",
diff --git a/net/core/filter.c b/net/core/filter.c
index 9aafec3a09ed..849611a1a51a 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -6621,7 +6621,7 @@ static const struct bpf_func_proto bpf_sk_release_proto = {
 	.func		= bpf_sk_release,
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_BTF_ID_SOCK_COMMON,
+	.arg1_type	= ARG_PTR_TO_BTF_ID_SOCK_COMMON | OBJ_RELEASE,
 };
 
 BPF_CALL_5(bpf_xdp_sk_lookup_udp, struct xdp_buff *, ctx,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v2 3/7] bpf: Add bpf_dynptr_from_mem, bpf_dynptr_alloc, bpf_dynptr_put
  2022-04-16  6:34 [PATCH bpf-next v2 0/7] Dynamic pointers Joanne Koong
  2022-04-16  6:34 ` [PATCH bpf-next v2 1/7] bpf: Add MEM_UNINIT as a bpf_type_flag Joanne Koong
  2022-04-16  6:34 ` [PATCH bpf-next v2 2/7] bpf: Add OBJ_RELEASE " Joanne Koong
@ 2022-04-16  6:34 ` Joanne Koong
  2022-04-16 17:42   ` Kumar Kartikeya Dwivedi
                     ` (2 more replies)
  2022-04-16  6:34 ` [PATCH bpf-next v2 4/7] bpf: Add bpf_dynptr_read and bpf_dynptr_write Joanne Koong
                   ` (4 subsequent siblings)
  7 siblings, 3 replies; 27+ messages in thread
From: Joanne Koong @ 2022-04-16  6:34 UTC (permalink / raw)
  To: bpf; +Cc: andrii, memxor, ast, daniel, toke, Joanne Koong

This patch adds 3 new APIs and the bulk of the verifier work for
supporting dynamic pointers in bpf.

There are different types of dynptrs. This patch starts with the most
basic ones, ones that reference a program's local memory
(eg a stack variable) and ones that reference memory that is dynamically
allocated on behalf of the program. If the memory is dynamically
allocated by the program, the program *must* free it before the program
exits. This is enforced by the verifier.

The added APIs are:

long bpf_dynptr_from_mem(void *data, u32 size, u64 flags, struct bpf_dynptr *ptr);
long bpf_dynptr_alloc(u32 size, u64 flags, struct bpf_dynptr *ptr);
void bpf_dynptr_put(struct bpf_dynptr *ptr);

This patch sets up the verifier to support dynptrs. Dynptrs will always
reside on the program's stack frame. As such, their state is tracked
in their corresponding stack slot, which includes the type of dynptr
(DYNPTR_LOCAL vs. DYNPTR_MALLOC).

When the program passes in an uninitialized dynptr (ARG_PTR_TO_DYNPTR |
MEM_UNINIT), the stack slots corresponding to the frame pointer
where the dynptr resides at are marked as STACK_DYNPTR. For helper functions
that take in initialized dynptrs (such as the next patch in this series
which supports dynptr reads/writes), the verifier enforces that the
dynptr has been initialized by checking that their corresponding stack
slots have been marked as STACK_DYNPTR. Dynptr release functions
(eg bpf_dynptr_put) will clear the stack slots. The verifier enforces at
program exit that there are no acquired dynptr stack slots that need
to be released.

There are other constraints that are enforced by the verifier as
well, such as that the dynptr cannot be written to directly by the bpf
program or by non-dynptr helper functions. The last patch in this series
contains tests that trigger different cases that the verifier needs to
successfully reject.

For now, local dynptrs cannot point to referenced memory since the
memory can be freed anytime. Support for this will be added as part
of a separate patchset.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/linux/bpf.h            |  68 +++++-
 include/linux/bpf_verifier.h   |  28 +++
 include/uapi/linux/bpf.h       |  44 ++++
 kernel/bpf/helpers.c           | 110 ++++++++++
 kernel/bpf/verifier.c          | 372 +++++++++++++++++++++++++++++++--
 scripts/bpf_doc.py             |   2 +
 tools/include/uapi/linux/bpf.h |  44 ++++
 7 files changed, 654 insertions(+), 14 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 29964cdb1dd6..fee91b07ee74 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -346,7 +346,16 @@ enum bpf_type_flag {
 
 	OBJ_RELEASE		= BIT(6 + BPF_BASE_TYPE_BITS),
 
-	__BPF_TYPE_LAST_FLAG	= OBJ_RELEASE,
+	/* DYNPTR points to a program's local memory (eg stack variable). */
+	DYNPTR_TYPE_LOCAL	= BIT(7 + BPF_BASE_TYPE_BITS),
+
+	/* DYNPTR points to dynamically allocated memory. */
+	DYNPTR_TYPE_MALLOC	= BIT(8 + BPF_BASE_TYPE_BITS),
+
+	/* May not be a referenced object */
+	NO_OBJ_REF		= BIT(9 + BPF_BASE_TYPE_BITS),
+
+	__BPF_TYPE_LAST_FLAG	= NO_OBJ_REF,
 };
 
 /* Max number of base types. */
@@ -390,6 +399,7 @@ enum bpf_arg_type {
 	ARG_PTR_TO_STACK,	/* pointer to stack */
 	ARG_PTR_TO_CONST_STR,	/* pointer to a null terminated read-only string */
 	ARG_PTR_TO_TIMER,	/* pointer to bpf_timer */
+	ARG_PTR_TO_DYNPTR,      /* pointer to bpf_dynptr. See bpf_type_flag for dynptr type */
 	__BPF_ARG_TYPE_MAX,
 
 	/* Extended arg_types. */
@@ -2394,4 +2404,60 @@ int bpf_bprintf_prepare(char *fmt, u32 fmt_size, const u64 *raw_args,
 			u32 **bin_buf, u32 num_args);
 void bpf_bprintf_cleanup(void);
 
+/* the implementation of the opaque uapi struct bpf_dynptr */
+struct bpf_dynptr_kern {
+	void *data;
+	/* Size represents the number of usable bytes in the dynptr.
+	 * If for example the offset is at 200 for a malloc dynptr with
+	 * allocation size 256, the number of usable bytes is 56.
+	 *
+	 * The upper 8 bits are reserved.
+	 * Bit 31 denotes whether the dynptr is read-only.
+	 * Bits 28-30 denote the dynptr type.
+	 */
+	u32 size;
+	u32 offset;
+} __aligned(8);
+
+enum bpf_dynptr_type {
+	BPF_DYNPTR_TYPE_INVALID,
+	/* Local memory used by the bpf program (eg stack variable) */
+	BPF_DYNPTR_TYPE_LOCAL,
+	/* Memory allocated dynamically by the kernel for the dynptr */
+	BPF_DYNPTR_TYPE_MALLOC,
+};
+
+/* Since the upper 8 bits of dynptr->size is reserved, the
+ * maximum supported size is 2^24 - 1.
+ */
+#define DYNPTR_MAX_SIZE	((1UL << 24) - 1)
+#define DYNPTR_SIZE_MASK	0xFFFFFF
+#define DYNPTR_TYPE_SHIFT	28
+#define DYNPTR_TYPE_MASK	0x7
+
+static inline enum bpf_dynptr_type bpf_dynptr_get_type(struct bpf_dynptr_kern *ptr)
+{
+	return (ptr->size >> DYNPTR_TYPE_SHIFT) & DYNPTR_TYPE_MASK;
+}
+
+static inline void bpf_dynptr_set_type(struct bpf_dynptr_kern *ptr, enum bpf_dynptr_type type)
+{
+	ptr->size |= type << DYNPTR_TYPE_SHIFT;
+}
+
+static inline u32 bpf_dynptr_get_size(struct bpf_dynptr_kern *ptr)
+{
+	return ptr->size & DYNPTR_SIZE_MASK;
+}
+
+static inline int bpf_dynptr_check_size(u32 size)
+{
+	return size > DYNPTR_MAX_SIZE ? -E2BIG : 0;
+}
+
+void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data, enum bpf_dynptr_type type,
+		     u32 offset, u32 size);
+
+void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
+
 #endif /* _LINUX_BPF_H */
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 7a01adc9e13f..e11440a44e92 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -72,6 +72,27 @@ struct bpf_reg_state {
 
 		u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */
 
+		/* For dynptr stack slots */
+		struct {
+			enum bpf_dynptr_type type;
+			/* A dynptr is 16 bytes so it takes up 2 stack slots.
+			 * We need to track which slot is the first slot
+			 * to protect against cases where the user may try to
+			 * pass in an address starting at the second slot of the
+			 * dynptr.
+			 */
+			bool first_slot;
+		} dynptr;
+		/* For stack slots that a local dynptr points to. We need to track
+		 * this to prohibit programs from using stack variables that are
+		 * pointed to by dynptrs as a dynptr, eg something like
+		 *
+		 * bpf_dyntpr_from_mem(&ptr, sizeof(ptr), 0, &local);
+		 * bpf_dynptr_alloc(16, 0, &ptr);
+		 * bpf_dynptr_write(&local, 0, corrupt_data, sizeof(ptr));
+		 */
+		bool is_dynptr_data;
+
 		/* Max size from any of the above. */
 		struct {
 			unsigned long raw1;
@@ -174,9 +195,16 @@ enum bpf_stack_slot_type {
 	STACK_SPILL,      /* register spilled into stack */
 	STACK_MISC,	  /* BPF program wrote some data into this slot */
 	STACK_ZERO,	  /* BPF program wrote constant zero */
+	/* A dynptr is stored in this stack slot. The type of dynptr
+	 * is stored in bpf_stack_state->spilled_ptr.dynptr.type
+	 */
+	STACK_DYNPTR,
 };
 
 #define BPF_REG_SIZE 8	/* size of eBPF register in bytes */
+/* size of a struct bpf_dynptr in bytes */
+#define BPF_DYNPTR_SIZE sizeof(struct bpf_dynptr_kern)
+#define BPF_DYNPTR_NR_SLOTS (BPF_DYNPTR_SIZE / BPF_REG_SIZE)
 
 struct bpf_stack_state {
 	struct bpf_reg_state spilled_ptr;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d14b10b85e51..e339b2697d9a 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5143,6 +5143,42 @@ union bpf_attr {
  *		The **hash_algo** is returned on success,
  *		**-EOPNOTSUP** if the hash calculation failed or **-EINVAL** if
  *		invalid arguments are passed.
+ *
+ * long bpf_dynptr_from_mem(void *data, u32 size, u64 flags, struct bpf_dynptr *ptr)
+ *	Description
+ *		Get a dynptr to local memory *data*.
+ *
+ *		For a dynptr to a dynamic memory allocation, please use
+ *		bpf_dynptr_alloc instead.
+ *
+ *		The maximum *size* supported is DYNPTR_MAX_SIZE.
+ *		*flags* is currently unused.
+ *	Return
+ *		0 on success, -E2BIG if the size exceeds DYNPTR_MAX_SIZE,
+ *		-EINVAL if flags is not 0.
+ *
+ * long bpf_dynptr_alloc(u32 size, u64 flags, struct bpf_dynptr *ptr)
+ *	Description
+ *		Allocate memory of *size* bytes.
+ *
+ *		Every call to bpf_dynptr_alloc must have a corresponding
+ *		bpf_dynptr_put, regardless of whether the bpf_dynptr_alloc
+ *		succeeded.
+ *
+ *		The maximum *size* supported is DYNPTR_MAX_SIZE.
+ *		Supported *flags* are __GFP_ZERO.
+ *	Return
+ *		0 on success, -ENOMEM if there is not enough memory for the
+ *		allocation, -E2BIG if the size exceeds DYNPTR_MAX_SIZE, -EINVAL
+ *		if the flags is not supported.
+ *
+ * void bpf_dynptr_put(struct bpf_dynptr *ptr)
+ *	Description
+ *		Free memory allocated by bpf_dynptr_alloc.
+ *
+ *		After this operation, *ptr* will be an invalidated dynptr.
+ *	Return
+ *		Void.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5339,6 +5375,9 @@ union bpf_attr {
 	FN(copy_from_user_task),	\
 	FN(skb_set_tstamp),		\
 	FN(ima_file_hash),		\
+	FN(dynptr_from_mem),		\
+	FN(dynptr_alloc),		\
+	FN(dynptr_put),			\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
@@ -6486,6 +6525,11 @@ struct bpf_timer {
 	__u64 :64;
 } __attribute__((aligned(8)));
 
+struct bpf_dynptr {
+	__u64 :64;
+	__u64 :64;
+} __attribute__((aligned(8)));
+
 struct bpf_sysctl {
 	__u32	write;		/* Sysctl is being read (= 0) or written (= 1).
 				 * Allows 1,2,4-byte read, but no write.
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index a47aae5c7335..87c14edda315 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1374,6 +1374,110 @@ void bpf_timer_cancel_and_free(void *val)
 	kfree(t);
 }
 
+void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data, enum bpf_dynptr_type type,
+		     u32 offset, u32 size)
+{
+	ptr->data = data;
+	ptr->offset = offset;
+	ptr->size = size;
+	bpf_dynptr_set_type(ptr, type);
+}
+
+void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr)
+{
+	memset(ptr, 0, sizeof(*ptr));
+}
+
+BPF_CALL_4(bpf_dynptr_from_mem, void *, data, u32, size, u64, flags, struct bpf_dynptr_kern *, ptr)
+{
+	int err;
+
+	err = bpf_dynptr_check_size(size);
+	if (err)
+		goto error;
+
+	/* flags is currently unsupported */
+	if (flags) {
+		err = -EINVAL;
+		goto error;
+	}
+
+	bpf_dynptr_init(ptr, data, BPF_DYNPTR_TYPE_LOCAL, 0, size);
+
+	return 0;
+
+error:
+	bpf_dynptr_set_null(ptr);
+	return err;
+}
+
+const struct bpf_func_proto bpf_dynptr_from_mem_proto = {
+	.func		= bpf_dynptr_from_mem,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_MEM_UNINIT | NO_OBJ_REF,
+	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
+	.arg3_type	= ARG_ANYTHING,
+	.arg4_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL | MEM_UNINIT,
+};
+
+BPF_CALL_3(bpf_dynptr_alloc, u32, size, u64, flags, struct bpf_dynptr_kern *, ptr)
+{
+	gfp_t gfp_flags = GFP_ATOMIC;
+	void *data;
+	int err;
+
+	err = bpf_dynptr_check_size(size);
+	if (err)
+		goto error;
+
+	if (flags) {
+		if (flags == __GFP_ZERO) {
+			gfp_flags |= flags;
+		} else {
+			err = -EINVAL;
+			goto error;
+		}
+	}
+
+	data = kmalloc(size, gfp_flags);
+	if (!data) {
+		err = -ENOMEM;
+		goto error;
+	}
+
+	bpf_dynptr_init(ptr, data, BPF_DYNPTR_TYPE_MALLOC, 0, size);
+
+	return 0;
+
+error:
+	bpf_dynptr_set_null(ptr);
+	return err;
+}
+
+const struct bpf_func_proto bpf_dynptr_alloc_proto = {
+	.func		= bpf_dynptr_alloc,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_ANYTHING,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_MALLOC | MEM_UNINIT,
+};
+
+BPF_CALL_1(bpf_dynptr_put, struct bpf_dynptr_kern *, dynptr)
+{
+	kfree(dynptr->data);
+	bpf_dynptr_set_null(dynptr);
+	return 0;
+}
+
+const struct bpf_func_proto bpf_dynptr_put_proto = {
+	.func		= bpf_dynptr_put,
+	.gpl_only	= false,
+	.ret_type	= RET_VOID,
+	.arg1_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_MALLOC | OBJ_RELEASE,
+};
+
 const struct bpf_func_proto bpf_get_current_task_proto __weak;
 const struct bpf_func_proto bpf_get_current_task_btf_proto __weak;
 const struct bpf_func_proto bpf_probe_read_user_proto __weak;
@@ -1426,6 +1530,12 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		return &bpf_loop_proto;
 	case BPF_FUNC_strncmp:
 		return &bpf_strncmp_proto;
+	case BPF_FUNC_dynptr_from_mem:
+		return &bpf_dynptr_from_mem_proto;
+	case BPF_FUNC_dynptr_alloc:
+		return &bpf_dynptr_alloc_proto;
+	case BPF_FUNC_dynptr_put:
+		return &bpf_dynptr_put_proto;
 	default:
 		break;
 	}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 8deb588a19ce..bf132c6822e4 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -187,6 +187,9 @@ struct bpf_verifier_stack_elem {
 					  POISON_POINTER_DELTA))
 #define BPF_MAP_PTR(X)		((struct bpf_map *)((X) & ~BPF_MAP_PTR_UNPRIV))
 
+/* forward declarations */
+static bool arg_type_is_mem_size(enum bpf_arg_type type);
+
 static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux)
 {
 	return BPF_MAP_PTR(aux->map_ptr_state) == BPF_MAP_PTR_POISON;
@@ -257,7 +260,9 @@ struct bpf_call_arg_meta {
 	struct btf *ret_btf;
 	u32 ret_btf_id;
 	u32 subprogno;
-	bool release_ref;
+	u8 release_regno;
+	bool release_dynptr;
+	u8 uninit_dynptr_regno;
 };
 
 struct btf *btf_vmlinux;
@@ -576,6 +581,7 @@ static char slot_type_char[] = {
 	[STACK_SPILL]	= 'r',
 	[STACK_MISC]	= 'm',
 	[STACK_ZERO]	= '0',
+	[STACK_DYNPTR]	= 'd',
 };
 
 static void print_liveness(struct bpf_verifier_env *env,
@@ -591,6 +597,25 @@ static void print_liveness(struct bpf_verifier_env *env,
 		verbose(env, "D");
 }
 
+static inline int get_spi(s32 off)
+{
+	return (-off - 1) / BPF_REG_SIZE;
+}
+
+static bool is_spi_bounds_valid(struct bpf_func_state *state, int spi, u32 nr_slots)
+{
+	int allocated_slots = state->allocated_stack / BPF_REG_SIZE;
+
+	/* We need to check that slots between [spi - nr_slots + 1, spi] are
+	 * within [0, allocated_stack).
+	 *
+	 * Please note that the spi grows downwards. For example, a dynptr
+	 * takes the size of two stack slots; the first slot will be at
+	 * spi and the second slot will be at spi - 1.
+	 */
+	return spi - nr_slots + 1 >= 0 && spi < allocated_slots;
+}
+
 static struct bpf_func_state *func(struct bpf_verifier_env *env,
 				   const struct bpf_reg_state *reg)
 {
@@ -642,6 +667,191 @@ static void mark_verifier_state_scratched(struct bpf_verifier_env *env)
 	env->scratched_stack_slots = ~0ULL;
 }
 
+#define DYNPTR_TYPE_FLAG_MASK (DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_MALLOC)
+
+static int arg_to_dynptr_type(enum bpf_arg_type arg_type)
+{
+	switch (arg_type & DYNPTR_TYPE_FLAG_MASK) {
+	case DYNPTR_TYPE_LOCAL:
+		return BPF_DYNPTR_TYPE_LOCAL;
+	case DYNPTR_TYPE_MALLOC:
+		return BPF_DYNPTR_TYPE_MALLOC;
+	default:
+		return BPF_DYNPTR_TYPE_INVALID;
+	}
+}
+
+static inline bool dynptr_type_refcounted(enum bpf_dynptr_type type)
+{
+	return type == BPF_DYNPTR_TYPE_MALLOC;
+}
+
+static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
+				   enum bpf_arg_type arg_type)
+{
+	struct bpf_func_state *state = cur_func(env);
+	enum bpf_dynptr_type type;
+	int spi, i;
+
+	spi = get_spi(reg->off);
+
+	if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS))
+		return -EINVAL;
+
+	type = arg_to_dynptr_type(arg_type);
+	if (type == BPF_DYNPTR_TYPE_INVALID)
+		return -EINVAL;
+
+	for (i = 0; i < BPF_REG_SIZE; i++) {
+		state->stack[spi].slot_type[i] = STACK_DYNPTR;
+		state->stack[spi - 1].slot_type[i] = STACK_DYNPTR;
+	}
+
+	state->stack[spi].spilled_ptr.dynptr.type = type;
+	state->stack[spi - 1].spilled_ptr.dynptr.type = type;
+
+	state->stack[spi].spilled_ptr.dynptr.first_slot = true;
+
+	return 0;
+}
+
+static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+{
+	struct bpf_func_state *state = func(env, reg);
+	int spi, i;
+
+	spi = get_spi(reg->off);
+
+	if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS))
+		return -EINVAL;
+
+	for (i = 0; i < BPF_REG_SIZE; i++) {
+		state->stack[spi].slot_type[i] = STACK_INVALID;
+		state->stack[spi - 1].slot_type[i] = STACK_INVALID;
+	}
+
+	state->stack[spi].spilled_ptr.dynptr.type = 0;
+	state->stack[spi - 1].spilled_ptr.dynptr.type = 0;
+
+	state->stack[spi].spilled_ptr.dynptr.first_slot = 0;
+
+	return 0;
+}
+
+static int mark_as_dynptr_data(struct bpf_verifier_env *env, const struct bpf_func_proto *fn,
+			       struct bpf_reg_state *regs)
+{
+	struct bpf_func_state *state = cur_func(env);
+	struct bpf_reg_state *reg, *mem_reg = NULL;
+	enum bpf_arg_type arg_type;
+	u64 mem_size;
+	u32 nr_slots;
+	int i, spi;
+
+	/* We must protect against the case where a program tries to do something
+	 * like this:
+	 *
+	 * bpf_dynptr_from_mem(&ptr, sizeof(ptr), 0, &local);
+	 * bpf_dynptr_alloc(16, 0, &ptr);
+	 * bpf_dynptr_write(&local, 0, corrupt_data, sizeof(ptr));
+	 *
+	 * If ptr is a variable on the stack, we must mark the stack slot as
+	 * dynptr data when a local dynptr to it is created.
+	 */
+	for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
+		arg_type = fn->arg_type[i];
+		reg = &regs[BPF_REG_1 + i];
+
+		if (base_type(arg_type) == ARG_PTR_TO_MEM) {
+			if (base_type(reg->type) == PTR_TO_STACK) {
+				mem_reg = reg;
+				continue;
+			}
+			/* if it's not a PTR_TO_STACK, then we don't need to
+			 * mark anything since it can never be used as a dynptr.
+			 * We can just return here since there will always be
+			 * only one ARG_PTR_TO_MEM in fn.
+			 */
+			return 0;
+		} else if (arg_type_is_mem_size(arg_type)) {
+			mem_size = roundup(reg->var_off.value, BPF_REG_SIZE);
+		}
+	}
+
+	if (!mem_reg || !mem_size) {
+		verbose(env, "verifier internal error: invalid ARG_PTR_TO_MEM args for %s\n", __func__);
+		return -EFAULT;
+	}
+
+	spi = get_spi(mem_reg->off);
+	if (!is_spi_bounds_valid(state, spi, mem_size)) {
+		verbose(env, "verifier internal error: variable not initialized on stack in %s\n", __func__);
+		return -EFAULT;
+	}
+
+	nr_slots = mem_size / BPF_REG_SIZE;
+	for (i = 0; i < nr_slots; i++)
+		state->stack[spi - i].spilled_ptr.is_dynptr_data = true;
+
+	return 0;
+}
+
+static bool is_dynptr_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
+				       bool *is_dynptr_data)
+{
+	struct bpf_func_state *state = func(env, reg);
+	int spi;
+
+	spi = get_spi(reg->off);
+
+	if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS))
+		return true;
+
+	if (state->stack[spi].slot_type[0] == STACK_DYNPTR ||
+	    state->stack[spi - 1].slot_type[0] == STACK_DYNPTR)
+		return false;
+
+	if (state->stack[spi].spilled_ptr.is_dynptr_data ||
+	    state->stack[spi - 1].spilled_ptr.is_dynptr_data) {
+		*is_dynptr_data = true;
+		return false;
+	}
+
+	return true;
+}
+
+static bool is_dynptr_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
+				     enum bpf_arg_type arg_type)
+{
+	struct bpf_func_state *state = func(env, reg);
+	int spi = get_spi(reg->off);
+
+	if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS) ||
+	    state->stack[spi].slot_type[0] != STACK_DYNPTR ||
+	    state->stack[spi - 1].slot_type[0] != STACK_DYNPTR ||
+	    !state->stack[spi].spilled_ptr.dynptr.first_slot)
+		return false;
+
+	/* ARG_PTR_TO_DYNPTR takes any type of dynptr */
+	if (arg_type == ARG_PTR_TO_DYNPTR)
+		return true;
+
+	return state->stack[spi].spilled_ptr.dynptr.type == arg_to_dynptr_type(arg_type);
+}
+
+static bool stack_access_into_dynptr(struct bpf_func_state *state, int spi, int size)
+{
+	int nr_slots = roundup(size, BPF_REG_SIZE) / BPF_REG_SIZE;
+	int i;
+
+	for (i = 0; i < nr_slots; i++) {
+		if (state->stack[spi - i].slot_type[0] == STACK_DYNPTR)
+			return true;
+	}
+
+	return false;
+}
+
 /* The reg state of a pointer or a bounded scalar was saved when
  * it was spilled to the stack.
  */
@@ -2878,6 +3088,12 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 	}
 
 	mark_stack_slot_scratched(env, spi);
+
+	if (stack_access_into_dynptr(state, spi, size)) {
+		verbose(env, "direct write into dynptr is not permitted\n");
+		return -EINVAL;
+	}
+
 	if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) &&
 	    !register_is_null(reg) && env->bpf_capable) {
 		if (dst_reg != BPF_REG_FP) {
@@ -2999,6 +3215,12 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env,
 		slot = -i - 1;
 		spi = slot / BPF_REG_SIZE;
 		stype = &state->stack[spi].slot_type[slot % BPF_REG_SIZE];
+
+		if (*stype == STACK_DYNPTR) {
+			verbose(env, "direct write into dynptr is not permitted\n");
+			return -EINVAL;
+		}
+
 		mark_stack_slot_scratched(env, spi);
 
 		if (!env->allow_ptr_leaks
@@ -5141,6 +5363,16 @@ static bool arg_type_is_int_ptr(enum bpf_arg_type type)
 	       type == ARG_PTR_TO_LONG;
 }
 
+static inline bool arg_type_is_dynptr(enum bpf_arg_type type)
+{
+	return base_type(type) == ARG_PTR_TO_DYNPTR;
+}
+
+static inline bool arg_type_is_dynptr_uninit(enum bpf_arg_type type)
+{
+	return arg_type_is_dynptr(type) && (type & MEM_UNINIT);
+}
+
 static int int_ptr_type_to_size(enum bpf_arg_type type)
 {
 	if (type == ARG_PTR_TO_INT)
@@ -5278,6 +5510,7 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
 	[ARG_PTR_TO_STACK]		= &stack_ptr_types,
 	[ARG_PTR_TO_CONST_STR]		= &const_str_ptr_types,
 	[ARG_PTR_TO_TIMER]		= &timer_types,
+	[ARG_PTR_TO_DYNPTR]		= &stack_ptr_types,
 };
 
 static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
@@ -5450,10 +5683,16 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		return err;
 
 skip_type_check:
-	/* check_func_arg_reg_off relies on only one referenced register being
-	 * allowed for BPF helpers.
-	 */
 	if (reg->ref_obj_id) {
+		if (arg_type & NO_OBJ_REF) {
+			verbose(env, "Arg #%d cannot be a referenced object\n",
+				arg + 1);
+			return -EINVAL;
+		}
+
+		/* check_func_arg_reg_off relies on only one referenced register being
+		 * allowed for BPF helpers.
+		 */
 		if (meta->ref_obj_id) {
 			verbose(env, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n",
 				regno, reg->ref_obj_id,
@@ -5463,16 +5702,26 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		meta->ref_obj_id = reg->ref_obj_id;
 	}
 	if (arg_type & OBJ_RELEASE) {
-		if (!reg->ref_obj_id) {
+		if (arg_type_is_dynptr(arg_type)) {
+			struct bpf_func_state *state = func(env, reg);
+			int spi = get_spi(reg->off);
+
+			if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS) ||
+			    !state->stack[spi].spilled_ptr.id) {
+				verbose(env, "arg %d is an unacquired reference\n", regno);
+				return -EINVAL;
+			}
+			meta->release_dynptr = true;
+		} else if (!reg->ref_obj_id) {
 			verbose(env, "arg %d is an unacquired reference\n", regno);
 			return -EINVAL;
 		}
-		if (meta->release_ref) {
-			verbose(env, "verifier internal error: more than one release_ref arg R%d\n",
-				regno);
+		if (meta->release_regno) {
+			verbose(env, "verifier internal error: more than one release_regno %u %u\n",
+				meta->release_regno, regno);
 			return -EFAULT;
 		}
-		meta->release_ref = true;
+		meta->release_regno = regno;
 	}
 
 	if (arg_type == ARG_CONST_MAP_PTR) {
@@ -5565,6 +5814,44 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
 
 		err = check_mem_size_reg(env, reg, regno, zero_size_allowed, meta);
+	} else if (arg_type_is_dynptr(arg_type)) {
+		/* Can't pass in a dynptr at a weird offset */
+		if (reg->off % BPF_REG_SIZE) {
+			verbose(env, "cannot pass in non-zero dynptr offset\n");
+			return -EINVAL;
+		}
+
+		if (arg_type & MEM_UNINIT)  {
+			bool is_dynptr_data = false;
+
+			if (!is_dynptr_reg_valid_uninit(env, reg, &is_dynptr_data)) {
+				if (is_dynptr_data)
+					verbose(env, "Arg #%d cannot be a memory reference for another dynptr\n",
+						arg + 1);
+				else
+					verbose(env, "Arg #%d dynptr has to be an uninitialized dynptr\n",
+						arg + 1);
+				return -EINVAL;
+			}
+
+			meta->uninit_dynptr_regno = arg + BPF_REG_1;
+		} else if (!is_dynptr_reg_valid_init(env, reg, arg_type)) {
+			const char *err_extra = "";
+
+			switch (arg_type & DYNPTR_TYPE_FLAG_MASK) {
+			case DYNPTR_TYPE_LOCAL:
+				err_extra = "local ";
+				break;
+			case DYNPTR_TYPE_MALLOC:
+				err_extra = "malloc ";
+				break;
+			default:
+				break;
+			}
+			verbose(env, "Expected an initialized %sdynptr as arg #%d\n",
+				err_extra, arg + 1);
+			return -EINVAL;
+		}
 	} else if (arg_type_is_alloc_size(arg_type)) {
 		if (!tnum_is_const(reg->var_off)) {
 			verbose(env, "R%d is not a known constant'\n",
@@ -6545,6 +6832,28 @@ static int check_reference_leak(struct bpf_verifier_env *env)
 	return state->acquired_refs ? -EINVAL : 0;
 }
 
+/* Called at BPF_EXIT to detect if there are any reference-tracked dynptrs that have
+ * not been released. Dynptrs to local memory do not need to be released.
+ */
+static int check_dynptr_unreleased(struct bpf_verifier_env *env)
+{
+	struct bpf_func_state *state = cur_func(env);
+	int allocated_slots, i;
+
+	allocated_slots = state->allocated_stack / BPF_REG_SIZE;
+
+	for (i = 0; i < allocated_slots; i++) {
+		if (state->stack[i].slot_type[0] == STACK_DYNPTR) {
+			if (dynptr_type_refcounted(state->stack[i].spilled_ptr.dynptr.type)) {
+				verbose(env, "spi=%d is an unreleased dynptr\n", i);
+				return -EINVAL;
+			}
+		}
+	}
+
+	return 0;
+}
+
 static int check_bpf_snprintf_call(struct bpf_verifier_env *env,
 				   struct bpf_reg_state *regs)
 {
@@ -6686,8 +6995,38 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			return err;
 	}
 
-	if (meta.release_ref) {
-		err = release_reference(env, meta.ref_obj_id);
+	regs = cur_regs(env);
+
+	if (meta.uninit_dynptr_regno) {
+		enum bpf_arg_type type;
+
+		/* we write BPF_W bits (4 bytes) at a time */
+		for (i = 0; i < BPF_DYNPTR_SIZE; i += 4) {
+			err = check_mem_access(env, insn_idx, meta.uninit_dynptr_regno,
+					       i, BPF_W, BPF_WRITE, -1, false);
+			if (err)
+				return err;
+		}
+
+		type = fn->arg_type[meta.uninit_dynptr_regno - BPF_REG_1];
+
+		err = mark_stack_slots_dynptr(env, &regs[meta.uninit_dynptr_regno], type);
+		if (err)
+			return err;
+
+		if (type & DYNPTR_TYPE_LOCAL) {
+			err = mark_as_dynptr_data(env, fn, regs);
+			if (err)
+				return err;
+		}
+	}
+
+	if (meta.release_regno) {
+		if (meta.release_dynptr) {
+			err = unmark_stack_slots_dynptr(env, &regs[meta.release_regno]);
+		} else {
+			err = release_reference(env, meta.ref_obj_id);
+		}
 		if (err) {
 			verbose(env, "func %s#%d reference has not been acquired before\n",
 				func_id_name(func_id), func_id);
@@ -6695,8 +7034,6 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		}
 	}
 
-	regs = cur_regs(env);
-
 	switch (func_id) {
 	case BPF_FUNC_tail_call:
 		err = check_reference_leak(env);
@@ -6704,6 +7041,11 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			verbose(env, "tail_call would lead to reference leak\n");
 			return err;
 		}
+		err = check_dynptr_unreleased(env);
+		if (err) {
+			verbose(env, "tail_call would lead to dynptr memory leak\n");
+			return err;
+		}
 		break;
 	case BPF_FUNC_get_local_storage:
 		/* check that flags argument in get_local_storage(map, flags) is 0,
@@ -11696,6 +12038,10 @@ static int do_check(struct bpf_verifier_env *env)
 					return -EINVAL;
 				}
 
+				err = check_dynptr_unreleased(env);
+				if (err)
+					return err;
+
 				if (state->curframe) {
 					/* exit from nested function */
 					err = prepare_func_exit(env, &env->insn_idx);
diff --git a/scripts/bpf_doc.py b/scripts/bpf_doc.py
index 096625242475..766dcbc73897 100755
--- a/scripts/bpf_doc.py
+++ b/scripts/bpf_doc.py
@@ -633,6 +633,7 @@ class PrinterHelpers(Printer):
             'struct socket',
             'struct file',
             'struct bpf_timer',
+            'struct bpf_dynptr',
     ]
     known_types = {
             '...',
@@ -682,6 +683,7 @@ class PrinterHelpers(Printer):
             'struct socket',
             'struct file',
             'struct bpf_timer',
+            'struct bpf_dynptr',
     }
     mapped_types = {
             'u8': '__u8',
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index d14b10b85e51..e339b2697d9a 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5143,6 +5143,42 @@ union bpf_attr {
  *		The **hash_algo** is returned on success,
  *		**-EOPNOTSUP** if the hash calculation failed or **-EINVAL** if
  *		invalid arguments are passed.
+ *
+ * long bpf_dynptr_from_mem(void *data, u32 size, u64 flags, struct bpf_dynptr *ptr)
+ *	Description
+ *		Get a dynptr to local memory *data*.
+ *
+ *		For a dynptr to a dynamic memory allocation, please use
+ *		bpf_dynptr_alloc instead.
+ *
+ *		The maximum *size* supported is DYNPTR_MAX_SIZE.
+ *		*flags* is currently unused.
+ *	Return
+ *		0 on success, -E2BIG if the size exceeds DYNPTR_MAX_SIZE,
+ *		-EINVAL if flags is not 0.
+ *
+ * long bpf_dynptr_alloc(u32 size, u64 flags, struct bpf_dynptr *ptr)
+ *	Description
+ *		Allocate memory of *size* bytes.
+ *
+ *		Every call to bpf_dynptr_alloc must have a corresponding
+ *		bpf_dynptr_put, regardless of whether the bpf_dynptr_alloc
+ *		succeeded.
+ *
+ *		The maximum *size* supported is DYNPTR_MAX_SIZE.
+ *		Supported *flags* are __GFP_ZERO.
+ *	Return
+ *		0 on success, -ENOMEM if there is not enough memory for the
+ *		allocation, -E2BIG if the size exceeds DYNPTR_MAX_SIZE, -EINVAL
+ *		if the flags is not supported.
+ *
+ * void bpf_dynptr_put(struct bpf_dynptr *ptr)
+ *	Description
+ *		Free memory allocated by bpf_dynptr_alloc.
+ *
+ *		After this operation, *ptr* will be an invalidated dynptr.
+ *	Return
+ *		Void.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5339,6 +5375,9 @@ union bpf_attr {
 	FN(copy_from_user_task),	\
 	FN(skb_set_tstamp),		\
 	FN(ima_file_hash),		\
+	FN(dynptr_from_mem),		\
+	FN(dynptr_alloc),		\
+	FN(dynptr_put),			\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
@@ -6486,6 +6525,11 @@ struct bpf_timer {
 	__u64 :64;
 } __attribute__((aligned(8)));
 
+struct bpf_dynptr {
+	__u64 :64;
+	__u64 :64;
+} __attribute__((aligned(8)));
+
 struct bpf_sysctl {
 	__u32	write;		/* Sysctl is being read (= 0) or written (= 1).
 				 * Allows 1,2,4-byte read, but no write.
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v2 4/7] bpf: Add bpf_dynptr_read and bpf_dynptr_write
  2022-04-16  6:34 [PATCH bpf-next v2 0/7] Dynamic pointers Joanne Koong
                   ` (2 preceding siblings ...)
  2022-04-16  6:34 ` [PATCH bpf-next v2 3/7] bpf: Add bpf_dynptr_from_mem, bpf_dynptr_alloc, bpf_dynptr_put Joanne Koong
@ 2022-04-16  6:34 ` Joanne Koong
  2022-04-16  6:34 ` [PATCH bpf-next v2 5/7] bpf: Add dynptr data slices Joanne Koong
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Joanne Koong @ 2022-04-16  6:34 UTC (permalink / raw)
  To: bpf; +Cc: andrii, memxor, ast, daniel, toke, Joanne Koong

This patch adds two helper functions, bpf_dynptr_read and
bpf_dynptr_write:

long bpf_dynptr_read(void *dst, u32 len, struct bpf_dynptr *src, u32 offset);

long bpf_dynptr_write(struct bpf_dynptr *dst, u32 offset, void *src, u32 len);

The dynptr passed into these functions must be valid dynptrs that have
been initialized.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/linux/bpf.h            | 16 ++++++++++
 include/uapi/linux/bpf.h       | 19 ++++++++++++
 kernel/bpf/helpers.c           | 56 ++++++++++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h | 19 ++++++++++++
 4 files changed, 110 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index fee91b07ee74..8eb32ec201bf 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2434,6 +2434,12 @@ enum bpf_dynptr_type {
 #define DYNPTR_SIZE_MASK	0xFFFFFF
 #define DYNPTR_TYPE_SHIFT	28
 #define DYNPTR_TYPE_MASK	0x7
+#define DYNPTR_RDONLY_BIT	BIT(31)
+
+static inline bool bpf_dynptr_is_rdonly(struct bpf_dynptr_kern *ptr)
+{
+	return ptr->size & DYNPTR_RDONLY_BIT;
+}
 
 static inline enum bpf_dynptr_type bpf_dynptr_get_type(struct bpf_dynptr_kern *ptr)
 {
@@ -2455,6 +2461,16 @@ static inline int bpf_dynptr_check_size(u32 size)
 	return size > DYNPTR_MAX_SIZE ? -E2BIG : 0;
 }
 
+static inline int bpf_dynptr_check_off_len(struct bpf_dynptr_kern *ptr, u32 offset, u32 len)
+{
+	u32 capacity = bpf_dynptr_get_size(ptr) - ptr->offset;
+
+	if (len > capacity || offset > capacity - len)
+		return -EINVAL;
+
+	return 0;
+}
+
 void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data, enum bpf_dynptr_type type,
 		     u32 offset, u32 size);
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e339b2697d9a..abe9a221ef08 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5179,6 +5179,23 @@ union bpf_attr {
  *		After this operation, *ptr* will be an invalidated dynptr.
  *	Return
  *		Void.
+ *
+ * long bpf_dynptr_read(void *dst, u32 len, struct bpf_dynptr *src, u32 offset)
+ *	Description
+ *		Read *len* bytes from *src* into *dst*, starting from *offset*
+ *		into *src*.
+ *	Return
+ *		0 on success, -EINVAL if *offset* + *len* exceeds the length
+ *		of *src*'s data or if *src* is an invalid dynptr.
+ *
+ * long bpf_dynptr_write(struct bpf_dynptr *dst, u32 offset, void *src, u32 len)
+ *	Description
+ *		Write *len* bytes from *src* into *dst*, starting from *offset*
+ *		into *dst*.
+ *	Return
+ *		0 on success, -EINVAL if *offset* + *len* exceeds the length
+ *		of *dst*'s data or if *dst* is an invalid dynptr or if *dst*
+ *		is a read-only dynptr.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5378,6 +5395,8 @@ union bpf_attr {
 	FN(dynptr_from_mem),		\
 	FN(dynptr_alloc),		\
 	FN(dynptr_put),			\
+	FN(dynptr_read),		\
+	FN(dynptr_write),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 87c14edda315..ae2239375c51 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1478,6 +1478,58 @@ const struct bpf_func_proto bpf_dynptr_put_proto = {
 	.arg1_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_MALLOC | OBJ_RELEASE,
 };
 
+BPF_CALL_4(bpf_dynptr_read, void *, dst, u32, len, struct bpf_dynptr_kern *, src, u32, offset)
+{
+	int err;
+
+	if (!src->data)
+		return -EINVAL;
+
+	err = bpf_dynptr_check_off_len(src, offset, len);
+	if (err)
+		return err;
+
+	memcpy(dst, src->data + src->offset + offset, len);
+
+	return 0;
+}
+
+const struct bpf_func_proto bpf_dynptr_read_proto = {
+	.func		= bpf_dynptr_read,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_MEM_UNINIT,
+	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
+	.arg3_type	= ARG_PTR_TO_DYNPTR,
+	.arg4_type	= ARG_ANYTHING,
+};
+
+BPF_CALL_4(bpf_dynptr_write, struct bpf_dynptr_kern *, dst, u32, offset, void *, src, u32, len)
+{
+	int err;
+
+	if (!dst->data || bpf_dynptr_is_rdonly(dst))
+		return -EINVAL;
+
+	err = bpf_dynptr_check_off_len(dst, offset, len);
+	if (err)
+		return err;
+
+	memcpy(dst->data + dst->offset + offset, src, len);
+
+	return 0;
+}
+
+const struct bpf_func_proto bpf_dynptr_write_proto = {
+	.func		= bpf_dynptr_write,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_DYNPTR,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_PTR_TO_MEM | MEM_RDONLY,
+	.arg4_type	= ARG_CONST_SIZE_OR_ZERO,
+};
+
 const struct bpf_func_proto bpf_get_current_task_proto __weak;
 const struct bpf_func_proto bpf_get_current_task_btf_proto __weak;
 const struct bpf_func_proto bpf_probe_read_user_proto __weak;
@@ -1536,6 +1588,10 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		return &bpf_dynptr_alloc_proto;
 	case BPF_FUNC_dynptr_put:
 		return &bpf_dynptr_put_proto;
+	case BPF_FUNC_dynptr_read:
+		return &bpf_dynptr_read_proto;
+	case BPF_FUNC_dynptr_write:
+		return &bpf_dynptr_write_proto;
 	default:
 		break;
 	}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index e339b2697d9a..abe9a221ef08 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5179,6 +5179,23 @@ union bpf_attr {
  *		After this operation, *ptr* will be an invalidated dynptr.
  *	Return
  *		Void.
+ *
+ * long bpf_dynptr_read(void *dst, u32 len, struct bpf_dynptr *src, u32 offset)
+ *	Description
+ *		Read *len* bytes from *src* into *dst*, starting from *offset*
+ *		into *src*.
+ *	Return
+ *		0 on success, -EINVAL if *offset* + *len* exceeds the length
+ *		of *src*'s data or if *src* is an invalid dynptr.
+ *
+ * long bpf_dynptr_write(struct bpf_dynptr *dst, u32 offset, void *src, u32 len)
+ *	Description
+ *		Write *len* bytes from *src* into *dst*, starting from *offset*
+ *		into *dst*.
+ *	Return
+ *		0 on success, -EINVAL if *offset* + *len* exceeds the length
+ *		of *dst*'s data or if *dst* is an invalid dynptr or if *dst*
+ *		is a read-only dynptr.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5378,6 +5395,8 @@ union bpf_attr {
 	FN(dynptr_from_mem),		\
 	FN(dynptr_alloc),		\
 	FN(dynptr_put),			\
+	FN(dynptr_read),		\
+	FN(dynptr_write),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v2 5/7] bpf: Add dynptr data slices
  2022-04-16  6:34 [PATCH bpf-next v2 0/7] Dynamic pointers Joanne Koong
                   ` (3 preceding siblings ...)
  2022-04-16  6:34 ` [PATCH bpf-next v2 4/7] bpf: Add bpf_dynptr_read and bpf_dynptr_write Joanne Koong
@ 2022-04-16  6:34 ` Joanne Koong
  2022-04-16  6:34 ` [PATCH bpf-next v2 6/7] bpf: Dynptr support for ring buffers Joanne Koong
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Joanne Koong @ 2022-04-16  6:34 UTC (permalink / raw)
  To: bpf; +Cc: andrii, memxor, ast, daniel, toke, Joanne Koong

This patch adds a new helper function

void *bpf_dynptr_data(struct bpf_dynptr *ptr, u32 offset, u32 len);

which returns a pointer to the underlying data of a dynptr. *len*
must be a statically known value. The bpf program may access the returned
data slice as a normal buffer (eg can do direct reads and writes), since
the verifier associates the length with the returned pointer, and
enforces that no out of bounds accesses occur.

This requires a few additions to the verifier. For every
referenced-tracked dynptr that is initialized, we create a unique id
and attach any data slices for that dynptr to the id. When a release
function is called on the dynptr (eg bpf_dynptr_put), we invalidate all
slices that correspond to that dynptr id. This ensures the slice can't
be used after its dynptr has been invalidated.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/linux/bpf_verifier.h   |  2 +
 include/uapi/linux/bpf.h       | 12 +++++
 kernel/bpf/helpers.c           | 28 +++++++++++
 kernel/bpf/verifier.c          | 88 +++++++++++++++++++++++++++++++---
 tools/include/uapi/linux/bpf.h | 12 +++++
 5 files changed, 135 insertions(+), 7 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index e11440a44e92..f914e00a300c 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -109,6 +109,8 @@ struct bpf_reg_state {
 	 * for the purpose of tracking that it's freed.
 	 * For PTR_TO_SOCKET this is used to share which pointers retain the
 	 * same reference to the socket, to determine proper reference freeing.
+	 * For stack slots that are dynptrs, this is used to track references to
+	 * the dynptr to determine proper reference freeing.
 	 */
 	u32 id;
 	/* PTR_TO_SOCKET and PTR_TO_TCP_SOCK could be a ptr returned
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index abe9a221ef08..a47e8b787033 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5196,6 +5196,17 @@ union bpf_attr {
  *		0 on success, -EINVAL if *offset* + *len* exceeds the length
  *		of *dst*'s data or if *dst* is an invalid dynptr or if *dst*
  *		is a read-only dynptr.
+ *
+ * void *bpf_dynptr_data(struct bpf_dynptr *ptr, u32 offset, u32 len)
+ *	Description
+ *		Get a pointer to the underlying dynptr data.
+ *
+ *		*len* must be a statically known value. The returned data slice
+ *		is invalidated whenever the dynptr is invalidated.
+ *	Return
+ *		Pointer to the underlying dynptr data, NULL if the dynptr is
+ *		read-only, if the dynptr is invalid, or if the offset and length
+ *		is out of bounds.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5397,6 +5408,7 @@ union bpf_attr {
 	FN(dynptr_put),			\
 	FN(dynptr_read),		\
 	FN(dynptr_write),		\
+	FN(dynptr_data),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index ae2239375c51..5bcc640a39db 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1530,6 +1530,32 @@ const struct bpf_func_proto bpf_dynptr_write_proto = {
 	.arg4_type	= ARG_CONST_SIZE_OR_ZERO,
 };
 
+BPF_CALL_3(bpf_dynptr_data, struct bpf_dynptr_kern *, ptr, u32, offset, u32, len)
+{
+	int err;
+
+	if (!ptr->data)
+		return 0;
+
+	err = bpf_dynptr_check_off_len(ptr, offset, len);
+	if (err)
+		return 0;
+
+	if (bpf_dynptr_is_rdonly(ptr))
+		return 0;
+
+	return (unsigned long)(ptr->data + ptr->offset + offset);
+}
+
+const struct bpf_func_proto bpf_dynptr_data_proto = {
+	.func		= bpf_dynptr_data,
+	.gpl_only	= false,
+	.ret_type	= RET_PTR_TO_ALLOC_MEM_OR_NULL,
+	.arg1_type	= ARG_PTR_TO_DYNPTR,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_CONST_ALLOC_SIZE_OR_ZERO,
+};
+
 const struct bpf_func_proto bpf_get_current_task_proto __weak;
 const struct bpf_func_proto bpf_get_current_task_btf_proto __weak;
 const struct bpf_func_proto bpf_probe_read_user_proto __weak;
@@ -1592,6 +1618,8 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		return &bpf_dynptr_read_proto;
 	case BPF_FUNC_dynptr_write:
 		return &bpf_dynptr_write_proto;
+	case BPF_FUNC_dynptr_data:
+		return &bpf_dynptr_data_proto;
 	default:
 		break;
 	}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index bf132c6822e4..06b29802c4ec 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -189,6 +189,9 @@ struct bpf_verifier_stack_elem {
 
 /* forward declarations */
 static bool arg_type_is_mem_size(enum bpf_arg_type type);
+static void release_reg_references(struct bpf_verifier_env *env,
+				   struct bpf_func_state *state,
+				   int ref_obj_id);
 
 static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux)
 {
@@ -483,7 +486,8 @@ static bool may_be_acquire_function(enum bpf_func_id func_id)
 		func_id == BPF_FUNC_sk_lookup_udp ||
 		func_id == BPF_FUNC_skc_lookup_tcp ||
 		func_id == BPF_FUNC_map_lookup_elem ||
-	        func_id == BPF_FUNC_ringbuf_reserve;
+		func_id == BPF_FUNC_ringbuf_reserve ||
+		func_id == BPF_FUNC_dynptr_data;
 }
 
 static bool is_acquire_function(enum bpf_func_id func_id,
@@ -494,7 +498,8 @@ static bool is_acquire_function(enum bpf_func_id func_id,
 	if (func_id == BPF_FUNC_sk_lookup_tcp ||
 	    func_id == BPF_FUNC_sk_lookup_udp ||
 	    func_id == BPF_FUNC_skc_lookup_tcp ||
-	    func_id == BPF_FUNC_ringbuf_reserve)
+	    func_id == BPF_FUNC_ringbuf_reserve ||
+	    func_id == BPF_FUNC_dynptr_data)
 		return true;
 
 	if (func_id == BPF_FUNC_map_lookup_elem &&
@@ -516,6 +521,11 @@ static bool is_ptr_cast_function(enum bpf_func_id func_id)
 		func_id == BPF_FUNC_skc_to_tcp_request_sock;
 }
 
+static inline bool is_dynptr_ref_function(enum bpf_func_id func_id)
+{
+	return func_id == BPF_FUNC_dynptr_data;
+}
+
 static bool is_cmpxchg_insn(const struct bpf_insn *insn)
 {
 	return BPF_CLASS(insn->code) == BPF_STX &&
@@ -691,7 +701,7 @@ static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_
 {
 	struct bpf_func_state *state = cur_func(env);
 	enum bpf_dynptr_type type;
-	int spi, i;
+	int spi, id, i;
 
 	spi = get_spi(reg->off);
 
@@ -712,11 +722,25 @@ static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_
 
 	state->stack[spi].spilled_ptr.dynptr.first_slot = true;
 
+	/* Generate an id for the dynptr if the dynptr type can be
+	 * acquired/released.
+	 *
+	 * This is used to associated data slices with dynptrs, so that
+	 * if a dynptr gets invalidated, its data slices will also be
+	 * invalidated.
+	 */
+	if (dynptr_type_refcounted(type)) {
+		id = ++env->id_gen;
+		state->stack[spi].spilled_ptr.id = id;
+		state->stack[spi - 1].spilled_ptr.id = id;
+	}
+
 	return 0;
 }
 
 static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
 {
+	struct bpf_verifier_state *vstate = env->cur_state;
 	struct bpf_func_state *state = func(env, reg);
 	int spi, i;
 
@@ -730,6 +754,15 @@ static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_re
 		state->stack[spi - 1].slot_type[i] = STACK_INVALID;
 	}
 
+	/* Invalidate any slices associated with this dynptr */
+	if (dynptr_type_refcounted(state->stack[spi].spilled_ptr.dynptr.type)) {
+		for (i = 0; i <= vstate->curframe; i++)
+			release_reg_references(env, vstate->frame[i],
+					       state->stack[spi].spilled_ptr.id);
+		state->stack[spi].spilled_ptr.id = 0;
+		state->stack[spi - 1].spilled_ptr.id = 0;
+	}
+
 	state->stack[spi].spilled_ptr.dynptr.type = 0;
 	state->stack[spi - 1].spilled_ptr.dynptr.type = 0;
 
@@ -839,6 +872,20 @@ static bool is_dynptr_reg_valid_init(struct bpf_verifier_env *env, struct bpf_re
 	return state->stack[spi].spilled_ptr.dynptr.type == arg_to_dynptr_type(arg_type);
 }
 
+static bool is_ref_obj_id_dynptr(struct bpf_func_state *state, u32 id)
+{
+	int allocated_slots = state->allocated_stack / BPF_REG_SIZE;
+	int i;
+
+	for (i = 0; i < allocated_slots; i++) {
+		if (state->stack[i].slot_type[0] == STACK_DYNPTR &&
+		    state->stack[i].spilled_ptr.id == id)
+			return true;
+	}
+
+	return false;
+}
+
 static bool stack_access_into_dynptr(struct bpf_func_state *state, int spi, int size)
 {
 	int nr_slots = roundup(size, BPF_REG_SIZE) / BPF_REG_SIZE;
@@ -5630,6 +5677,14 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
 	return __check_ptr_off_reg(env, reg, regno, fixed_off_ok);
 }
 
+static inline u32 stack_slot_get_id(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+{
+	struct bpf_func_state *state = func(env, reg);
+	int spi = get_spi(reg->off);
+
+	return state->stack[spi].spilled_ptr.id;
+}
+
 static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			  struct bpf_call_arg_meta *meta,
 			  const struct bpf_func_proto *fn)
@@ -7191,10 +7246,28 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		/* For release_reference() */
 		regs[BPF_REG_0].ref_obj_id = meta.ref_obj_id;
 	} else if (is_acquire_function(func_id, meta.map_ptr)) {
-		int id = acquire_reference_state(env, insn_idx);
+		int id;
+
+		if (is_dynptr_ref_function(func_id)) {
+			int i;
+
+			/* Find the id of the dynptr we're acquiring a reference to */
+			for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
+				if (arg_type_is_dynptr(fn->arg_type[i])) {
+					id = stack_slot_get_id(env, &regs[BPF_REG_1 + i]);
+					break;
+				}
+			}
+			if (unlikely(i == MAX_BPF_FUNC_REG_ARGS)) {
+				verbose(env, "verifier internal error: no dynptr args to a dynptr ref function");
+				return -EFAULT;
+			}
+		} else {
+			id = acquire_reference_state(env, insn_idx);
+			if (id < 0)
+				return id;
+		}
 
-		if (id < 0)
-			return id;
 		/* For mark_ptr_or_null_reg() */
 		regs[BPF_REG_0].id = id;
 		/* For release_reference() */
@@ -9630,7 +9703,8 @@ static void mark_ptr_or_null_regs(struct bpf_verifier_state *vstate, u32 regno,
 	u32 id = regs[regno].id;
 	int i;
 
-	if (ref_obj_id && ref_obj_id == id && is_null)
+	if (ref_obj_id && ref_obj_id == id && is_null &&
+	    !is_ref_obj_id_dynptr(state, id))
 		/* regs[regno] is in the " == NULL" branch.
 		 * No one could have freed the reference state before
 		 * doing the NULL check.
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index abe9a221ef08..a47e8b787033 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5196,6 +5196,17 @@ union bpf_attr {
  *		0 on success, -EINVAL if *offset* + *len* exceeds the length
  *		of *dst*'s data or if *dst* is an invalid dynptr or if *dst*
  *		is a read-only dynptr.
+ *
+ * void *bpf_dynptr_data(struct bpf_dynptr *ptr, u32 offset, u32 len)
+ *	Description
+ *		Get a pointer to the underlying dynptr data.
+ *
+ *		*len* must be a statically known value. The returned data slice
+ *		is invalidated whenever the dynptr is invalidated.
+ *	Return
+ *		Pointer to the underlying dynptr data, NULL if the dynptr is
+ *		read-only, if the dynptr is invalid, or if the offset and length
+ *		is out of bounds.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5397,6 +5408,7 @@ union bpf_attr {
 	FN(dynptr_put),			\
 	FN(dynptr_read),		\
 	FN(dynptr_write),		\
+	FN(dynptr_data),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v2 6/7] bpf: Dynptr support for ring buffers
  2022-04-16  6:34 [PATCH bpf-next v2 0/7] Dynamic pointers Joanne Koong
                   ` (4 preceding siblings ...)
  2022-04-16  6:34 ` [PATCH bpf-next v2 5/7] bpf: Add dynptr data slices Joanne Koong
@ 2022-04-16  6:34 ` Joanne Koong
  2022-04-16  6:34 ` [PATCH bpf-next v2 7/7] bpf: Dynptr tests Joanne Koong
  2022-04-16  8:13 ` [PATCH bpf-next v2 0/7] Dynamic pointers Kumar Kartikeya Dwivedi
  7 siblings, 0 replies; 27+ messages in thread
From: Joanne Koong @ 2022-04-16  6:34 UTC (permalink / raw)
  To: bpf; +Cc: andrii, memxor, ast, daniel, toke, Joanne Koong

Currently, our only way of writing dynamically-sized data into a ring
buffer is through bpf_ringbuf_output but this incurs an extra memcpy
cost. bpf_ringbuf_reserve + bpf_ringbuf_commit avoids this extra
memcpy, but it can only safely support reservation sizes that are
statically known since the verifier cannot guarantee that the bpf
program won’t access memory outside the reserved space.

The bpf_dynptr abstraction allows for dynamically-sized ring buffer
reservations without the extra memcpy.

There are 3 new APIs:

long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr);
void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags);
void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags);

These closely follow the functionalities of the original ringbuf APIs.
For example, all ringbuffer dynptrs that have been reserved must be
either submitted or discarded before the program exits.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/linux/bpf.h            | 10 ++++-
 include/uapi/linux/bpf.h       | 35 +++++++++++++++++
 kernel/bpf/helpers.c           |  6 +++
 kernel/bpf/ringbuf.c           | 71 ++++++++++++++++++++++++++++++++++
 kernel/bpf/verifier.c          | 18 +++++++--
 tools/include/uapi/linux/bpf.h | 35 +++++++++++++++++
 6 files changed, 171 insertions(+), 4 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 8eb32ec201bf..d0a8b46d2ec3 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -355,7 +355,10 @@ enum bpf_type_flag {
 	/* May not be a referenced object */
 	NO_OBJ_REF		= BIT(9 + BPF_BASE_TYPE_BITS),
 
-	__BPF_TYPE_LAST_FLAG	= NO_OBJ_REF,
+	/* DYNPTR points to a ringbuf record. */
+	DYNPTR_TYPE_RINGBUF	= BIT(10 + BPF_BASE_TYPE_BITS),
+
+	__BPF_TYPE_LAST_FLAG	= DYNPTR_TYPE_RINGBUF,
 };
 
 /* Max number of base types. */
@@ -2256,6 +2259,9 @@ extern const struct bpf_func_proto bpf_ringbuf_reserve_proto;
 extern const struct bpf_func_proto bpf_ringbuf_submit_proto;
 extern const struct bpf_func_proto bpf_ringbuf_discard_proto;
 extern const struct bpf_func_proto bpf_ringbuf_query_proto;
+extern const struct bpf_func_proto bpf_ringbuf_reserve_dynptr_proto;
+extern const struct bpf_func_proto bpf_ringbuf_submit_dynptr_proto;
+extern const struct bpf_func_proto bpf_ringbuf_discard_dynptr_proto;
 extern const struct bpf_func_proto bpf_skc_to_tcp6_sock_proto;
 extern const struct bpf_func_proto bpf_skc_to_tcp_sock_proto;
 extern const struct bpf_func_proto bpf_skc_to_tcp_timewait_sock_proto;
@@ -2425,6 +2431,8 @@ enum bpf_dynptr_type {
 	BPF_DYNPTR_TYPE_LOCAL,
 	/* Memory allocated dynamically by the kernel for the dynptr */
 	BPF_DYNPTR_TYPE_MALLOC,
+	/* Underlying data is a ringbuf record */
+	BPF_DYNPTR_TYPE_RINGBUF,
 };
 
 /* Since the upper 8 bits of dynptr->size is reserved, the
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index a47e8b787033..b2485ff4d683 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5207,6 +5207,38 @@ union bpf_attr {
  *		Pointer to the underlying dynptr data, NULL if the dynptr is
  *		read-only, if the dynptr is invalid, or if the offset and length
  *		is out of bounds.
+ *
+ * long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr)
+ *	Description
+ *		Reserve *size* bytes of payload in a ring buffer *ringbuf*
+ *		through the dynptr interface. *flags* must be 0.
+ *
+ *		Please note that a corresponding bpf_ringbuf_submit_dynptr or
+ *		bpf_ringbuf_discard_dynptr must be called on *ptr*, even if the
+ *		reservation fails. This is enforced by the verifier.
+ *	Return
+ *		0 on success, or a negative error in case of failure.
+ *
+ * void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags)
+ *	Description
+ *		Submit reserved ring buffer sample, pointed to by *data*,
+ *		through the dynptr interface. This is a no-op if the dynptr is
+ *		invalid/null.
+ *
+ *		For more information on *flags*, please see
+ *		'bpf_ringbuf_submit'.
+ *	Return
+ *		Nothing. Always succeeds.
+ *
+ * void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags)
+ *	Description
+ *		Discard reserved ring buffer sample through the dynptr
+ *		interface. This is a no-op if the dynptr is invalid/null.
+ *
+ *		For more information on *flags*, please see
+ *		'bpf_ringbuf_discard'.
+ *	Return
+ *		Nothing. Always succeeds.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5409,6 +5441,9 @@ union bpf_attr {
 	FN(dynptr_read),		\
 	FN(dynptr_write),		\
 	FN(dynptr_data),		\
+	FN(ringbuf_reserve_dynptr),	\
+	FN(ringbuf_submit_dynptr),	\
+	FN(ringbuf_discard_dynptr),	\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 5bcc640a39db..4731b9a818e5 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1602,6 +1602,12 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		return &bpf_ringbuf_discard_proto;
 	case BPF_FUNC_ringbuf_query:
 		return &bpf_ringbuf_query_proto;
+	case BPF_FUNC_ringbuf_reserve_dynptr:
+		return &bpf_ringbuf_reserve_dynptr_proto;
+	case BPF_FUNC_ringbuf_submit_dynptr:
+		return &bpf_ringbuf_submit_dynptr_proto;
+	case BPF_FUNC_ringbuf_discard_dynptr:
+		return &bpf_ringbuf_discard_dynptr_proto;
 	case BPF_FUNC_for_each_map_elem:
 		return &bpf_for_each_map_elem_proto;
 	case BPF_FUNC_loop:
diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index 5173fd37590f..5c66d598e82f 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -475,3 +475,74 @@ const struct bpf_func_proto bpf_ringbuf_query_proto = {
 	.arg1_type	= ARG_CONST_MAP_PTR,
 	.arg2_type	= ARG_ANYTHING,
 };
+
+BPF_CALL_4(bpf_ringbuf_reserve_dynptr, struct bpf_map *, map, u32, size, u64, flags,
+	   struct bpf_dynptr_kern *, ptr)
+{
+	void *sample;
+	int err;
+
+	err = bpf_dynptr_check_size(size);
+	if (err) {
+		bpf_dynptr_set_null(ptr);
+		return err;
+	}
+
+	sample = (void __force *)____bpf_ringbuf_reserve(map, size, flags);
+
+	if (!sample) {
+		bpf_dynptr_set_null(ptr);
+		return -EINVAL;
+	}
+
+	bpf_dynptr_init(ptr, sample, BPF_DYNPTR_TYPE_RINGBUF, 0, size);
+
+	return 0;
+}
+
+const struct bpf_func_proto bpf_ringbuf_reserve_dynptr_proto = {
+	.func		= bpf_ringbuf_reserve_dynptr,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_CONST_MAP_PTR,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_ANYTHING,
+	.arg4_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_RINGBUF | MEM_UNINIT,
+};
+
+BPF_CALL_2(bpf_ringbuf_submit_dynptr, struct bpf_dynptr_kern *, ptr, u64, flags)
+{
+	if (!ptr->data)
+		return 0;
+
+	____bpf_ringbuf_submit(ptr->data, flags);
+
+	bpf_dynptr_set_null(ptr);
+
+	return 0;
+}
+
+const struct bpf_func_proto bpf_ringbuf_submit_dynptr_proto = {
+	.func		= bpf_ringbuf_submit_dynptr,
+	.ret_type	= RET_VOID,
+	.arg1_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_RINGBUF | OBJ_RELEASE,
+	.arg2_type	= ARG_ANYTHING,
+};
+
+BPF_CALL_2(bpf_ringbuf_discard_dynptr, struct bpf_dynptr_kern *, ptr, u64, flags)
+{
+	if (!ptr->data)
+		return 0;
+
+	____bpf_ringbuf_discard(ptr->data, flags);
+
+	bpf_dynptr_set_null(ptr);
+
+	return 0;
+}
+
+const struct bpf_func_proto bpf_ringbuf_discard_dynptr_proto = {
+	.func		= bpf_ringbuf_discard_dynptr,
+	.ret_type	= RET_VOID,
+	.arg1_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_RINGBUF | OBJ_RELEASE,
+	.arg2_type	= ARG_ANYTHING,
+};
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 06b29802c4ec..5b5cb221dda6 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -677,7 +677,7 @@ static void mark_verifier_state_scratched(struct bpf_verifier_env *env)
 	env->scratched_stack_slots = ~0ULL;
 }
 
-#define DYNPTR_TYPE_FLAG_MASK (DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_MALLOC)
+#define DYNPTR_TYPE_FLAG_MASK (DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_MALLOC | DYNPTR_TYPE_RINGBUF)
 
 static int arg_to_dynptr_type(enum bpf_arg_type arg_type)
 {
@@ -686,6 +686,8 @@ static int arg_to_dynptr_type(enum bpf_arg_type arg_type)
 		return BPF_DYNPTR_TYPE_LOCAL;
 	case DYNPTR_TYPE_MALLOC:
 		return BPF_DYNPTR_TYPE_MALLOC;
+	case DYNPTR_TYPE_RINGBUF:
+		return BPF_DYNPTR_TYPE_RINGBUF;
 	default:
 		return BPF_DYNPTR_TYPE_INVALID;
 	}
@@ -693,7 +695,7 @@ static int arg_to_dynptr_type(enum bpf_arg_type arg_type)
 
 static inline bool dynptr_type_refcounted(enum bpf_dynptr_type type)
 {
-	return type == BPF_DYNPTR_TYPE_MALLOC;
+	return type == BPF_DYNPTR_TYPE_MALLOC || type == BPF_DYNPTR_TYPE_RINGBUF;
 }
 
 static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
@@ -5900,9 +5902,13 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			case DYNPTR_TYPE_MALLOC:
 				err_extra = "malloc ";
 				break;
+			case DYNPTR_TYPE_RINGBUF:
+				err_extra = "ringbuf ";
+				break;
 			default:
 				break;
 			}
+
 			verbose(env, "Expected an initialized %sdynptr as arg #%d\n",
 				err_extra, arg + 1);
 			return -EINVAL;
@@ -6024,7 +6030,10 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 	case BPF_MAP_TYPE_RINGBUF:
 		if (func_id != BPF_FUNC_ringbuf_output &&
 		    func_id != BPF_FUNC_ringbuf_reserve &&
-		    func_id != BPF_FUNC_ringbuf_query)
+		    func_id != BPF_FUNC_ringbuf_query &&
+		    func_id != BPF_FUNC_ringbuf_reserve_dynptr &&
+		    func_id != BPF_FUNC_ringbuf_submit_dynptr &&
+		    func_id != BPF_FUNC_ringbuf_discard_dynptr)
 			goto error;
 		break;
 	case BPF_MAP_TYPE_STACK_TRACE:
@@ -6140,6 +6149,9 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 	case BPF_FUNC_ringbuf_output:
 	case BPF_FUNC_ringbuf_reserve:
 	case BPF_FUNC_ringbuf_query:
+	case BPF_FUNC_ringbuf_reserve_dynptr:
+	case BPF_FUNC_ringbuf_submit_dynptr:
+	case BPF_FUNC_ringbuf_discard_dynptr:
 		if (map->map_type != BPF_MAP_TYPE_RINGBUF)
 			goto error;
 		break;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index a47e8b787033..b2485ff4d683 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5207,6 +5207,38 @@ union bpf_attr {
  *		Pointer to the underlying dynptr data, NULL if the dynptr is
  *		read-only, if the dynptr is invalid, or if the offset and length
  *		is out of bounds.
+ *
+ * long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr)
+ *	Description
+ *		Reserve *size* bytes of payload in a ring buffer *ringbuf*
+ *		through the dynptr interface. *flags* must be 0.
+ *
+ *		Please note that a corresponding bpf_ringbuf_submit_dynptr or
+ *		bpf_ringbuf_discard_dynptr must be called on *ptr*, even if the
+ *		reservation fails. This is enforced by the verifier.
+ *	Return
+ *		0 on success, or a negative error in case of failure.
+ *
+ * void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags)
+ *	Description
+ *		Submit reserved ring buffer sample, pointed to by *data*,
+ *		through the dynptr interface. This is a no-op if the dynptr is
+ *		invalid/null.
+ *
+ *		For more information on *flags*, please see
+ *		'bpf_ringbuf_submit'.
+ *	Return
+ *		Nothing. Always succeeds.
+ *
+ * void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags)
+ *	Description
+ *		Discard reserved ring buffer sample through the dynptr
+ *		interface. This is a no-op if the dynptr is invalid/null.
+ *
+ *		For more information on *flags*, please see
+ *		'bpf_ringbuf_discard'.
+ *	Return
+ *		Nothing. Always succeeds.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5409,6 +5441,9 @@ union bpf_attr {
 	FN(dynptr_read),		\
 	FN(dynptr_write),		\
 	FN(dynptr_data),		\
+	FN(ringbuf_reserve_dynptr),	\
+	FN(ringbuf_submit_dynptr),	\
+	FN(ringbuf_discard_dynptr),	\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v2 7/7] bpf: Dynptr tests
  2022-04-16  6:34 [PATCH bpf-next v2 0/7] Dynamic pointers Joanne Koong
                   ` (5 preceding siblings ...)
  2022-04-16  6:34 ` [PATCH bpf-next v2 6/7] bpf: Dynptr support for ring buffers Joanne Koong
@ 2022-04-16  6:34 ` Joanne Koong
  2022-04-16  8:13 ` [PATCH bpf-next v2 0/7] Dynamic pointers Kumar Kartikeya Dwivedi
  7 siblings, 0 replies; 27+ messages in thread
From: Joanne Koong @ 2022-04-16  6:34 UTC (permalink / raw)
  To: bpf; +Cc: andrii, memxor, ast, daniel, toke, Joanne Koong

This patch adds tests for dynptrs. These include scenarios that the
verifier needs to reject, as well as some successful use cases of
dynptrs that should pass.

Some of the failure scenarios include checking against invalid
bpf_dynptr_puts, invalid writes, invalid reads, and invalid ringbuf
API usages.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 .../testing/selftests/bpf/prog_tests/dynptr.c | 138 ++++
 .../testing/selftests/bpf/progs/dynptr_fail.c | 643 ++++++++++++++++++
 .../selftests/bpf/progs/dynptr_success.c      | 217 ++++++
 3 files changed, 998 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/dynptr.c
 create mode 100644 tools/testing/selftests/bpf/progs/dynptr_fail.c
 create mode 100644 tools/testing/selftests/bpf/progs/dynptr_success.c

diff --git a/tools/testing/selftests/bpf/prog_tests/dynptr.c b/tools/testing/selftests/bpf/prog_tests/dynptr.c
new file mode 100644
index 000000000000..5bf161e1838c
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/dynptr.c
@@ -0,0 +1,138 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Facebook */
+
+#include <test_progs.h>
+#include "dynptr_fail.skel.h"
+#include "dynptr_success.skel.h"
+
+size_t log_buf_sz = 1048576; /* 1 MB */
+static char obj_log_buf[1048576];
+
+struct {
+	const char *prog_name;
+	const char *expected_err_msg;
+} dynptr_tests[] = {
+	/* failure cases */
+	{"missing_put", "spi=0 is an unreleased dynptr"},
+	{"missing_put_callback", "spi=0 is an unreleased dynptr"},
+	{"put_nonalloc", "arg 1 is an unacquired reference"},
+	{"put_data_slice", "type=alloc_mem expected=fp"},
+	{"put_uninit_dynptr", "arg 1 is an unacquired reference"},
+	{"use_after_put", "Expected an initialized dynptr as arg #3"},
+	{"alloc_twice", "Arg #3 dynptr has to be an uninitialized dynptr"},
+	{"add_dynptr_to_map1", "invalid indirect read from stack"},
+	{"add_dynptr_to_map2", "invalid indirect read from stack"},
+	{"ringbuf_invalid_access", "invalid mem access 'scalar'"},
+	{"ringbuf_invalid_api",
+		"func bpf_ringbuf_submit#132 reference has not been acquired before"},
+	{"ringbuf_out_of_bounds", "value is outside of the allowed memory range"},
+	{"data_slice_out_of_bounds", "value is outside of the allowed memory range"},
+	{"data_slice_use_after_put", "invalid mem access 'scalar'"},
+	{"invalid_helper1", "invalid indirect read from stack"},
+	{"invalid_helper2", "Expected an initialized dynptr as arg #3"},
+	{"invalid_write1", "direct write into dynptr is not permitted"},
+	{"invalid_write2", "direct write into dynptr is not permitted"},
+	{"invalid_write3", "direct write into dynptr is not permitted"},
+	{"invalid_write4", "direct write into dynptr is not permitted"},
+	{"invalid_read1", "invalid read from stack"},
+	{"invalid_read2", "cannot pass in non-zero dynptr offset"},
+	{"invalid_read3", "invalid read from stack"},
+	{"invalid_offset", "invalid write to stack"},
+	{"global", "R3 type=map_value expected=fp"},
+	{"put_twice", "arg 1 is an unacquired reference"},
+	{"put_twice_callback", "arg 1 is an unacquired reference"},
+	{"invalid_nested_dynptrs1", "direct write into dynptr is not permitted"},
+	{"invalid_nested_dynptrs2", "Arg #3 cannot be a memory reference for another dynptr"},
+	{"invalid_ref_mem1", "Arg #1 cannot be a referenced object"},
+	{"invalid_ref_mem2", "Arg #1 cannot be a referenced object"},
+	{"zero_slice_access", "invalid access to memory, mem_size=0 off=0 size=1"},
+	/* success cases */
+	{"test_basic", NULL},
+	{"test_data_slice", NULL},
+	{"test_ringbuf", NULL},
+	{"test_alloc_zero_bytes", NULL},
+};
+
+static void verify_fail(const char *prog_name, const char *expected_err_msg)
+{
+	LIBBPF_OPTS(bpf_object_open_opts, opts);
+	struct bpf_program *prog;
+	struct dynptr_fail *skel;
+	int err;
+
+	opts.kernel_log_buf = obj_log_buf;
+	opts.kernel_log_size = log_buf_sz;
+	opts.kernel_log_level = 1;
+
+	skel = dynptr_fail__open_opts(&opts);
+	if (!ASSERT_OK_PTR(skel, "dynptr_fail__open_opts"))
+		return;
+
+	bpf_object__for_each_program(prog, skel->obj)
+		bpf_program__set_autoload(prog, false);
+
+	prog = bpf_object__find_program_by_name(skel->obj, prog_name);
+	if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+		return;
+
+	bpf_program__set_autoload(prog, true);
+
+	err = dynptr_fail__load(skel);
+
+	ASSERT_ERR(err, "dynptr_fail__load");
+
+	if (!ASSERT_OK_PTR(strstr(obj_log_buf, expected_err_msg), "expected_err_msg")) {
+		fprintf(stderr, "Expected err_msg: %s\n", expected_err_msg);
+		fprintf(stderr, "Verifier output: %s\n", obj_log_buf);
+	}
+
+	dynptr_fail__destroy(skel);
+}
+
+static void verify_success(const char *prog_name)
+{
+	struct dynptr_success *skel;
+	struct bpf_program *prog;
+	struct bpf_link *link;
+
+	skel = dynptr_success__open();
+	if (!ASSERT_OK_PTR(skel, "dynptr_success__open"))
+		return;
+
+	skel->bss->pid = getpid();
+
+	dynptr_success__load(skel);
+	if (!ASSERT_OK_PTR(skel, "dynptr_success__load"))
+		return;
+
+	prog = bpf_object__find_program_by_name(skel->obj, prog_name);
+	if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+		return;
+
+	link = bpf_program__attach(prog);
+	if (!ASSERT_OK_PTR(link, "bpf_program__attach"))
+		return;
+
+	usleep(1);
+
+	ASSERT_EQ(skel->bss->err, 0, "err");
+
+	bpf_link__destroy(link);
+
+	dynptr_success__destroy(skel);
+}
+
+void test_dynptr(void)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(dynptr_tests); i++) {
+		if (!test__start_subtest(dynptr_tests[i].prog_name))
+			continue;
+
+		if (dynptr_tests[i].expected_err_msg)
+			verify_fail(dynptr_tests[i].prog_name, dynptr_tests[i].expected_err_msg);
+		else
+			verify_success(dynptr_tests[i].prog_name);
+	}
+}
diff --git a/tools/testing/selftests/bpf/progs/dynptr_fail.c b/tools/testing/selftests/bpf/progs/dynptr_fail.c
new file mode 100644
index 000000000000..215069cb7e0d
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/dynptr_fail.c
@@ -0,0 +1,643 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Facebook */
+
+#include <string.h>
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+char _license[] SEC("license") = "GPL";
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, __u32);
+	__type(value, struct bpf_dynptr);
+} array_map SEC(".maps");
+
+struct sample {
+	int pid;
+	long value;
+	char comm[16];
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_RINGBUF);
+	__uint(max_entries, 1 << 12);
+} ringbuf SEC(".maps");
+
+int err = 0;
+int val;
+
+/* Every bpf_dynptr_alloc call must have a corresponding bpf_dynptr_put call */
+SEC("raw_tp/sys_nanosleep")
+int missing_put(void *ctx)
+{
+	struct bpf_dynptr mem;
+
+	bpf_dynptr_alloc(8, 0, &mem);
+
+	/* missing a call to bpf_dynptr_put(&mem) */
+
+	return 0;
+}
+
+/* A non-alloc-ed dynptr can't be used by bpf_dynptr_put */
+SEC("raw_tp/sys_nanosleep")
+int put_nonalloc(void *ctx)
+{
+	struct bpf_dynptr ptr;
+	__u32 x = 0;
+
+	bpf_dynptr_from_mem(&x, sizeof(x), 0, &ptr);
+
+	/* this should fail */
+	bpf_dynptr_put(&ptr);
+
+	return 0;
+}
+
+/* A data slice from a dynptr can't be used by bpf_dynptr_put */
+SEC("raw_tp/sys_nanosleep")
+int put_data_slice(void *ctx)
+{
+	struct bpf_dynptr ptr;
+	void *data;
+
+	bpf_dynptr_alloc(8, 0, &ptr);
+
+	data = bpf_dynptr_data(&ptr, 0, 8);
+	if (!data)
+		goto done;
+
+	/* this should fail */
+	bpf_dynptr_put(data);
+
+done:
+	bpf_dynptr_put(&ptr);
+	return 0;
+}
+
+/* Can't call bpf_dynptr_put on a non-initialized dynptr */
+SEC("raw_tp/sys_nanosleep")
+int put_uninit_dynptr(void *ctx)
+{
+	struct bpf_dynptr ptr;
+
+	/* this should fail */
+	bpf_dynptr_put(&ptr);
+
+	return 0;
+}
+
+/* A dynptr can't be used after bpf_dynptr_put has been called on it */
+SEC("raw_tp/sys_nanosleep")
+int use_after_put(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	char read_data[64] = {};
+
+	bpf_dynptr_alloc(8, 0, &ptr);
+
+	bpf_dynptr_read(read_data, sizeof(read_data), &ptr, 0);
+
+	bpf_dynptr_put(&ptr);
+
+	/* this should fail */
+	bpf_dynptr_read(read_data, sizeof(read_data), &ptr, 0);
+
+	return 0;
+}
+
+/*
+ * Can't bpf_dynptr_alloc an existing allocated bpf_dynptr that bpf_dynptr_put
+ * hasn't been called on yet
+ */
+SEC("raw_tp/sys_nanosleep")
+int alloc_twice(void *ctx)
+{
+	struct bpf_dynptr ptr;
+
+	bpf_dynptr_alloc(8, 0, &ptr);
+
+	/* this should fail */
+	bpf_dynptr_alloc(2, 0, &ptr);
+
+	bpf_dynptr_put(&ptr);
+
+	return 0;
+}
+
+/*
+ * Can't access a ring buffer record after submit or discard has been called
+ * on the dynptr
+ */
+SEC("raw_tp/sys_nanosleep")
+int ringbuf_invalid_access(void *ctx)
+{
+	struct bpf_dynptr ptr;
+	struct sample *sample;
+
+	err = bpf_ringbuf_reserve_dynptr(&ringbuf, sizeof(*sample), 0, &ptr);
+	sample = bpf_dynptr_data(&ptr, 0, sizeof(*sample));
+	if (!sample)
+		goto done;
+
+	sample->pid = 123;
+
+	bpf_ringbuf_submit_dynptr(&ptr, 0);
+
+	/* this should fail */
+	err = sample->pid;
+
+	return 0;
+
+done:
+	bpf_ringbuf_discard_dynptr(&ptr, 0);
+	return 0;
+}
+
+/* Can't call non-dynptr ringbuf APIs on a dynptr ringbuf sample */
+SEC("raw_tp/sys_nanosleep")
+int ringbuf_invalid_api(void *ctx)
+{
+	struct bpf_dynptr ptr;
+	struct sample *sample;
+
+	err = bpf_ringbuf_reserve_dynptr(&ringbuf, sizeof(*sample), 0, &ptr);
+	sample = bpf_dynptr_data(&ptr, 0, sizeof(*sample));
+	if (!sample)
+		goto done;
+
+	sample->pid = 123;
+
+	/* invalid API use. need to use dynptr API to submit/discard */
+	bpf_ringbuf_submit(sample, 0);
+
+done:
+	bpf_ringbuf_discard_dynptr(&ptr, 0);
+	return 0;
+}
+
+/* Can't access memory outside a ringbuf record range */
+SEC("raw_tp/sys_nanosleep")
+int ringbuf_out_of_bounds(void *ctx)
+{
+	struct bpf_dynptr ptr;
+	struct sample *sample;
+
+	err = bpf_ringbuf_reserve_dynptr(&ringbuf, sizeof(*sample), 0, &ptr);
+	sample = bpf_dynptr_data(&ptr, 0, sizeof(*sample));
+	if (!sample)
+		goto done;
+
+	/* Can't access beyond sample range */
+	*(__u8 *)((void *)sample + sizeof(*sample)) = 123;
+
+	bpf_ringbuf_submit_dynptr(&ptr, 0);
+
+	return 0;
+
+done:
+	bpf_ringbuf_discard_dynptr(&ptr, 0);
+	return 0;
+}
+
+/* Can't add a dynptr to a map */
+SEC("raw_tp/sys_nanosleep")
+int add_dynptr_to_map1(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	char buf[64] = {};
+	int key = 0;
+
+	err = bpf_dynptr_from_mem(buf, sizeof(buf), 0, &ptr);
+
+	/* this should fail */
+	bpf_map_update_elem(&array_map, &key, &ptr, 0);
+
+	return 0;
+}
+
+/* Can't add a struct with an embedded dynptr to a map */
+SEC("raw_tp/sys_nanosleep")
+int add_dynptr_to_map2(void *ctx)
+{
+	struct info {
+		int x;
+		struct bpf_dynptr ptr;
+	};
+	struct info x;
+	int key = 0;
+
+	bpf_dynptr_alloc(8, 0, &x.ptr);
+
+	/* this should fail */
+	bpf_map_update_elem(&array_map, &key, &x, 0);
+
+	return 0;
+}
+
+/* Can't pass in a dynptr as an arg to a helper function that doesn't take in a
+ * dynptr argument
+ */
+SEC("raw_tp/sys_nanosleep")
+int invalid_helper1(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+
+	bpf_dynptr_alloc(8, 0, &ptr);
+
+	/* this should fail */
+	bpf_strncmp((const char *)&ptr, sizeof(ptr), "hello!");
+
+	bpf_dynptr_put(&ptr);
+
+	return 0;
+}
+
+/* A dynptr can't be passed into a helper function at a non-zero offset */
+SEC("raw_tp/sys_nanosleep")
+int invalid_helper2(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	char read_data[64] = {};
+	__u64 x = 0;
+
+	bpf_dynptr_from_mem(&x, sizeof(x), 0, &ptr);
+
+	/* this should fail */
+	bpf_dynptr_read(read_data, sizeof(read_data), (void *)&ptr + 8, 0);
+
+	return 0;
+}
+
+/* A data slice can't be accessed out of bounds */
+SEC("raw_tp/sys_nanosleep")
+int data_slice_out_of_bounds(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	void *data;
+
+	bpf_dynptr_alloc(8, 0, &ptr);
+
+	data = bpf_dynptr_data(&ptr, 0, 8);
+	if (!data)
+		goto done;
+
+	/* can't index out of bounds of the data slice */
+	val = *((char *)data + 8);
+
+done:
+	bpf_dynptr_put(&ptr);
+	return 0;
+}
+
+/* A data slice can't be used after bpf_dynptr_put is called */
+SEC("raw_tp/sys_nanosleep")
+int data_slice_use_after_put(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	void *data;
+
+	bpf_dynptr_alloc(8, 0, &ptr);
+
+	data = bpf_dynptr_data(&ptr, 0, 8);
+	if (!data)
+		goto done;
+
+	bpf_dynptr_put(&ptr);
+
+	/* this should fail */
+	val = *(__u8 *)data;
+
+done:
+	bpf_dynptr_put(&ptr);
+	return 0;
+}
+
+/*
+ * A bpf_dynptr can't be written directly to by the bpf program,
+ * only through dynptr helper functions
+ */
+SEC("raw_tp/sys_nanosleep")
+int invalid_write1(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	__u8 x = 0;
+
+	bpf_dynptr_alloc(8, 0, &ptr);
+
+	/* this should fail */
+	memcpy(&ptr, &x, sizeof(x));
+
+	bpf_dynptr_put(&ptr);
+
+	return 0;
+}
+
+/*
+ * A bpf_dynptr at a non-zero offset can't be written directly to
+ * by the bpf program, only through dynptr helper functions
+ */
+SEC("raw_tp/sys_nanosleep")
+int invalid_write2(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	char read_data[64] = {};
+	__u8 x = 0, y = 0;
+
+	bpf_dynptr_from_mem(&x, sizeof(x), 0, &ptr);
+
+	/* this should fail */
+	memcpy((void *)&ptr, &y, sizeof(y));
+
+	bpf_dynptr_read(read_data, sizeof(read_data), &ptr, 0);
+
+	return 0;
+}
+
+/* A non-const write into a dynptr is not permitted */
+SEC("raw_tp/sys_nanosleep")
+int invalid_write3(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	char stack_buf[16];
+	unsigned long len;
+	__u8 x = 0;
+
+	bpf_dynptr_alloc(8, 0, &ptr);
+
+	memcpy(stack_buf, &val, sizeof(val));
+	len = stack_buf[0] & 0xf;
+
+	/* this should fail */
+	memcpy((void *)&ptr + len, &x, sizeof(x));
+
+	bpf_dynptr_put(&ptr);
+
+	return 0;
+}
+
+static int invalid_write4_callback(__u32 index, void *data)
+{
+	/* this should fail */
+	*(__u32 *)data = 123;
+
+	bpf_dynptr_put(data);
+
+	return 0;
+}
+
+/* An invalid write can't occur in a callback function */
+SEC("raw_tp/sys_nanosleep")
+int invalid_write4(void *ctx)
+{
+	struct bpf_dynptr ptr;
+	__u64 x = 0;
+
+	bpf_dynptr_from_mem(&x, sizeof(x), 0, &ptr);
+
+	bpf_loop(10, invalid_write4_callback, &ptr, 0);
+
+	return 0;
+}
+
+/* A globally-defined bpf_dynptr can't be used (it must reside as a stack frame) */
+struct bpf_dynptr global_dynptr;
+SEC("raw_tp/sys_nanosleep")
+int global(void *ctx)
+{
+	/* this should fail */
+	bpf_dynptr_alloc(4, 0, &global_dynptr);
+
+	bpf_dynptr_put(&global_dynptr);
+
+	return 0;
+}
+
+/* A direct read should fail */
+SEC("raw_tp/sys_nanosleep")
+int invalid_read1(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	__u32 x = 2;
+
+	bpf_dynptr_from_mem(&x, sizeof(x), 0, &ptr);
+
+	/* this should fail */
+	val = *(int *)&ptr;
+
+	return 0;
+}
+
+/* A direct read at an offset should fail */
+SEC("raw_tp/sys_nanosleep")
+int invalid_read2(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	char read_data[64] = {};
+	__u64 x = 0;
+
+	bpf_dynptr_from_mem(&x, sizeof(x), 0, &ptr);
+
+	/* this should fail */
+	bpf_dynptr_read(read_data, sizeof(read_data), (void *)&ptr + 1, 0);
+
+	return 0;
+}
+
+/* A direct read at an offset into the lower stack slot should fail */
+SEC("raw_tp/sys_nanosleep")
+int invalid_read3(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	struct bpf_dynptr ptr2 = {};
+	__u32 x = 2;
+
+	bpf_dynptr_from_mem(&x, sizeof(x), 0, &ptr);
+	bpf_dynptr_from_mem(&x, sizeof(x), 0, &ptr2);
+
+	/* this should fail */
+	memcpy(&val, (void *)&ptr + 8, sizeof(val));
+
+	return 0;
+}
+
+/* Calling bpf_dynptr_from_mem on an offset should fail */
+SEC("raw_tp/sys_nanosleep")
+int invalid_offset(void *ctx)
+{
+	struct bpf_dynptr ptr = {};
+	__u64 x = 0;
+
+	/* this should fail */
+	bpf_dynptr_from_mem(&x, sizeof(x), 0, &ptr + 1);
+
+	return 0;
+}
+
+/* Can't call bpf_dynptr_put twice */
+SEC("raw_tp/sys_nanosleep")
+int put_twice(void *ctx)
+{
+	struct bpf_dynptr ptr;
+
+	bpf_dynptr_alloc(8, 0, &ptr);
+
+	bpf_dynptr_put(&ptr);
+
+	/* this second put should fail */
+	bpf_dynptr_put(&ptr);
+
+	return 0;
+}
+
+static int put_twice_callback_fn(__u32 index, void *data)
+{
+	/* this should fail */
+	bpf_dynptr_put(data);
+	val = index;
+	return 0;
+}
+
+/* Test that calling bpf_dynptr_put twice, where the 2nd put happens within a
+ * calback function, fails
+ */
+SEC("raw_tp/sys_nanosleep")
+int put_twice_callback(void *ctx)
+{
+	struct bpf_dynptr ptr;
+
+	bpf_dynptr_alloc(8, 0, &ptr);
+
+	bpf_dynptr_put(&ptr);
+
+	bpf_loop(10, put_twice_callback_fn, &ptr, 0);
+
+	return 0;
+}
+
+static int missing_put_callback_fn(__u32 index, void *data)
+{
+	struct bpf_dynptr ptr;
+
+	bpf_dynptr_alloc(8, 0, &ptr);
+
+	val = index;
+
+	/* missing bpf_dynptr_put(&ptr) */
+
+	return 0;
+}
+
+/* Any dynptr initialized within a callback must have bpf_dynptr_put called */
+SEC("raw_tp/sys_nanosleep")
+int missing_put_callback(void *ctx)
+{
+	bpf_loop(10, missing_put_callback_fn, NULL, 0);
+	return 0;
+}
+
+/* We can't have nested dynptrs or else the dynptr stack data can be written into */
+SEC("raw_tp/sys_nanosleep")
+int invalid_nested_dynptrs1(void *ctx)
+{
+	struct bpf_dynptr local;
+	struct bpf_dynptr ptr;
+	char write_data[64] = {};
+
+	bpf_dynptr_alloc(16, 0, &ptr);
+
+	/* this should fail */
+	bpf_dynptr_from_mem(&ptr, sizeof(ptr), 0, &local);
+
+	bpf_dynptr_write(&local, 0, write_data, sizeof(ptr));
+
+	bpf_dynptr_put(&ptr);
+
+	return 0;
+}
+
+SEC("raw_tp/sys_nanosleep")
+int invalid_nested_dynptrs2(void *ctx)
+{
+	struct bpf_dynptr local;
+	struct bpf_dynptr ptr;
+	char write_data[64] = {};
+
+	bpf_dynptr_from_mem(&ptr, sizeof(ptr), 0, &local);
+
+	/* this should fail */
+	bpf_dynptr_alloc(16, 0, &ptr);
+
+	bpf_dynptr_write(&local, 0, write_data, sizeof(ptr));
+
+	bpf_dynptr_put(&ptr);
+
+	return 0;
+}
+
+/* Can't have local dynptr to referenced memory */
+SEC("raw_tp/sys_nanosleep")
+int invalid_ref_mem1(void *ctx)
+{
+	struct bpf_dynptr ptr;
+	struct bpf_dynptr local;
+	void *data;
+
+	bpf_dynptr_alloc(16, 0, &ptr);
+	data = bpf_dynptr_data(&ptr, 0, 8);
+	if (!data)
+		goto done;
+
+	/* this should fail */
+	bpf_dynptr_from_mem(data, 1, 0, &local);
+
+done:
+	bpf_dynptr_put(&ptr);
+	return 0;
+}
+
+SEC("raw_tp/sys_nanosleep")
+int invalid_ref_mem2(void *ctx)
+{
+	struct bpf_dynptr ptr;
+	struct bpf_dynptr local;
+	struct sample *sample;
+
+	err = bpf_ringbuf_reserve_dynptr(&ringbuf, sizeof(*sample), 0, &ptr);
+	sample = bpf_dynptr_data(&ptr, 0, sizeof(*sample));
+	if (!sample)
+		goto done;
+
+	/* this should fail */
+	bpf_dynptr_from_mem(sample, sizeof(*sample), 0, &local);
+
+done:
+	bpf_ringbuf_discard_dynptr(&ptr, 0);
+	return 0;
+}
+
+/* Can't access memory in a zero-slice */
+SEC("raw_tp/sys_nanosleep")
+int zero_slice_access(void *ctx)
+{
+	struct bpf_dynptr ptr;
+	void *data;
+
+	bpf_dynptr_alloc(0, 0, &ptr);
+
+	data = bpf_dynptr_data(&ptr, 0, 0);
+	if (!data)
+		goto done;
+
+	/* this should fail */
+	*(__u8 *)data = 23;
+
+	val = *(__u8 *)data;
+
+done:
+	bpf_dynptr_put(&ptr);
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/dynptr_success.c b/tools/testing/selftests/bpf/progs/dynptr_success.c
new file mode 100644
index 000000000000..e23ece559f56
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/dynptr_success.c
@@ -0,0 +1,217 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Facebook */
+
+#include <string.h>
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+#include "errno.h"
+
+char _license[] SEC("license") = "GPL";
+
+int pid = 0;
+int err = 0;
+int val;
+
+struct sample {
+	int pid;
+	int seq;
+	long value;
+	char comm[16];
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_RINGBUF);
+	__uint(max_entries, 1 << 12);
+} ringbuf SEC(".maps");
+
+SEC("tp/syscalls/sys_enter_nanosleep")
+int test_basic(void *ctx)
+{
+	char write_data[64] = "hello there, world!!";
+	char read_data[64] = {}, buf[64] = {};
+	struct bpf_dynptr ptr = {};
+	int i;
+
+	if (bpf_get_current_pid_tgid() >> 32 != pid)
+		return 0;
+
+	err = bpf_dynptr_from_mem(buf, sizeof(buf), 0, &ptr);
+	if (err)
+		return 0;
+
+	/* Write data into the dynptr */
+	err = bpf_dynptr_write(&ptr, 0, write_data, sizeof(write_data));
+	if (err)
+		return 0;
+
+	/* Read the data that was written into the dynptr */
+	err = bpf_dynptr_read(read_data, sizeof(read_data), &ptr, 0);
+	if (err)
+		return 0;
+
+	/* Ensure the data we read matches the data we wrote */
+	for (i = 0; i < sizeof(read_data); i++) {
+		if (read_data[i] != write_data[i]) {
+			err = 1;
+			return 0;
+		}
+	}
+
+	return 0;
+}
+
+SEC("tp/syscalls/sys_enter_nanosleep")
+int test_data_slice(void *ctx)
+{
+	struct bpf_dynptr ptr;
+	__u32 alloc_size = 16;
+	void *data;
+
+	if (bpf_get_current_pid_tgid() >> 32 != pid)
+		return 0;
+
+	/* test passing in an invalid flag */
+	err = bpf_dynptr_alloc(alloc_size, 1, &ptr);
+	if (err != -EINVAL) {
+		err = 1;
+		goto done;
+	}
+	bpf_dynptr_put(&ptr);
+
+	err = bpf_dynptr_alloc(alloc_size, 0, &ptr);
+	if (err)
+		goto done;
+
+	/* Try getting a data slice that is out of range */
+	data = bpf_dynptr_data(&ptr, alloc_size + 1, 1);
+	if (data) {
+		err = 2;
+		goto done;
+	}
+
+	/* Try getting more bytes than available */
+	data = bpf_dynptr_data(&ptr, 0, alloc_size + 1);
+	if (data) {
+		err = 3;
+		goto done;
+	}
+
+	data = bpf_dynptr_data(&ptr, 0, sizeof(int));
+	if (!data) {
+		err = 4;
+		goto done;
+	}
+
+	*(__u32 *)data = 999;
+
+	err = bpf_probe_read_kernel(&val, sizeof(val), data);
+	if (err)
+		goto done;
+
+	if (val != *(int *)data)
+		err = 5;
+
+done:
+	bpf_dynptr_put(&ptr);
+	return 0;
+}
+
+static int ringbuf_callback(__u32 index, void *data)
+{
+	struct sample *sample;
+
+	struct bpf_dynptr *ptr = (struct bpf_dynptr *)data;
+
+	sample = bpf_dynptr_data(ptr, 0, sizeof(*sample));
+	if (!sample) {
+		err = 2;
+		return 0;
+	}
+
+	sample->pid += val;
+
+	return 0;
+}
+
+SEC("tp/syscalls/sys_enter_nanosleep")
+int test_ringbuf(void *ctx)
+{
+	struct bpf_dynptr ptr;
+	struct sample *sample;
+
+	if (bpf_get_current_pid_tgid() >> 32 != pid)
+		return 0;
+
+	val = 100;
+
+	/* check that you can reserve a dynamic size reservation */
+	err = bpf_ringbuf_reserve_dynptr(&ringbuf, val, 0, &ptr);
+	if (err)
+		goto done;
+
+	sample = bpf_dynptr_data(&ptr, 0, sizeof(*sample));
+	if (!sample) {
+		err = 1;
+		goto done;
+	}
+
+	sample->pid = 123;
+
+	/* Can pass dynptr to callback functions */
+	bpf_loop(10, ringbuf_callback, &ptr, 0);
+
+	bpf_ringbuf_submit_dynptr(&ptr, 0);
+
+	return 0;
+
+done:
+	bpf_ringbuf_discard_dynptr(&ptr, 0);
+	return 0;
+}
+
+SEC("tp/syscalls/sys_enter_nanosleep")
+int test_alloc_zero_bytes(void *ctx)
+{
+	struct bpf_dynptr ptr;
+	void *data;
+	__u8 x = 0;
+
+	if (bpf_get_current_pid_tgid() >> 32 != pid)
+		return 0;
+
+	err = bpf_dynptr_alloc(0, 0, &ptr);
+	if (err)
+		goto done;
+
+	err = bpf_dynptr_write(&ptr, 0, &x, sizeof(x));
+	if (err != -EINVAL) {
+		err = 1;
+		goto done;
+	}
+
+	err = bpf_dynptr_read(&x, sizeof(x), &ptr, 0);
+	if (err != -EINVAL) {
+		err = 2;
+		goto done;
+	}
+
+	/* try to access memory we don't have access to */
+	data = bpf_dynptr_data(&ptr, 0, 1);
+	if (data) {
+		err = 3;
+		goto done;
+	}
+
+	data = bpf_dynptr_data(&ptr, 0, 0);
+	if (!data) {
+		err = 4;
+		goto done;
+	}
+
+	err = 0;
+
+done:
+	bpf_dynptr_put(&ptr);
+	return 0;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v2 0/7] Dynamic pointers
  2022-04-16  6:34 [PATCH bpf-next v2 0/7] Dynamic pointers Joanne Koong
                   ` (6 preceding siblings ...)
  2022-04-16  6:34 ` [PATCH bpf-next v2 7/7] bpf: Dynptr tests Joanne Koong
@ 2022-04-16  8:13 ` Kumar Kartikeya Dwivedi
  2022-04-16  8:19   ` Kumar Kartikeya Dwivedi
  7 siblings, 1 reply; 27+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-04-16  8:13 UTC (permalink / raw)
  To: Joanne Koong; +Cc: bpf, andrii, ast, daniel, toke

On Sat, Apr 16, 2022 at 12:04:22PM IST, Joanne Koong wrote:
> This patchset implements the basics of dynamic pointers in bpf.
>
> A dynamic pointer (struct bpf_dynptr) is a pointer that stores extra metadata
> alongside the address it points to. This abstraction is useful in bpf, given
> that every memory access in a bpf program must be safe. The verifier and bpf
> helper functions can use the metadata to enforce safety guarantees for things
> such as dynamically sized strings and kernel heap allocations.
>
> From the program side, the bpf_dynptr is an opaque struct and the verifier
> will enforce that its contents are never written to by the program.
> It can only be written to through specific bpf helper functions.
>
> There are several uses cases for dynamic pointers in bpf programs. A list of
> some are: dynamically sized ringbuf reservations without any extra memcpys,
> dynamic string parsing and memory comparisons, dynamic memory allocations that
> can be persisted in a map, and dynamic parsing of sk_buff and xdp_md packet
> data.
>
> At a high-level, the patches are as follows:
> 1/7 - Adds MEM_UNINIT as a bpf_type_flag
> 2/7 - Adds MEM_RELEASE as a bpf_type_flag
> 3/7 - Adds bpf_dynptr_from_mem, bpf_dynptr_alloc, and bpf_dynptr_put
> 4/7 - Adds bpf_dynptr_read and bpf_dynptr_write
> 5/7 - Adds dynptr data slices (ptr to underlying dynptr memory)
> 6/7 - Adds dynptr support for ring buffers
> 7/7 - Tests to check that verifier rejects certain fail cases and passes
> certain success cases
>
> This is the first dynptr patchset in a larger series. The next series of
> patches will add persisting dynamic memory allocations in maps, parsing packet
> data through dynptrs, dynptrs to referenced objects, convenience helpers for
> using dynptrs as iterators, and more helper functions for interacting with
> strings and memory dynamically.
>

test_verifier has 5 failed tests, the following diff fixes them (three for
changed verifier error string, and two because we missed to do offset checks for
ARG_PTR_TO_ALLOC_MEM in check_func_arg_reg_off). Since this is all, I guess you
can wait for the review to complete for this version before respinning.



> Changelog:
> ----------
> v1 -> v2:
> v1: https://lore.kernel.org/bpf/20220402015826.3941317-1-joannekoong@fb.com/
>
> 1/7 -
>     * Remove ARG_PTR_TO_MAP_VALUE_UNINIT alias and use
>       ARG_PTR_TO_MAP_VALUE | MEM_UNINIT directly (Andrii)
>     * Drop arg_type_is_mem_ptr() wrapper function (Andrii)
>
> 2/7 -
>     * Change name from MEM_RELEASE to OBJ_RELEASE (Andrii)
>     * Use meta.release_ref instead of ref_obj_id != 0 to determine whether
>       to release reference (Kumar)
>     * Drop type_is_release_mem() wrapper function (Andrii)
>
> 3/7 -
>     * Add checks for nested dynptrs edge-cases, which could lead to corrupt
>     * writes of the dynptr stack variable.
>     * Add u64 flags to bpf_dynptr_from_mem() and bpf_dynptr_alloc() (Andrii)
>     * Rename from bpf_malloc/bpf_free to bpf_dynptr_alloc/bpf_dynptr_put
>       (Alexei)
>     * Support alloc flag __GFP_ZERO (Andrii)
>     * Reserve upper 8 bits in dynptr size and offset fields instead of
>       reserving just the upper 4 bits (Andrii)
>     * Allow dynptr zero-slices (Andrii)
>     * Use the highest bit for is_rdonly instead of the 28th bit (Andrii)
>     * Rename check_* functions to is_* functions for better readability
>       (Andrii)
>     * Add comment for code that checks the spi bounds (Andrii)
>
> 4/7 -
>     * Fix doc description for bpf_dynpt_read (Toke)
>     * Move bpf_dynptr_check_off_len() from function patch 1 to here (Andrii)
>
> 5/7 -
>     * When finding the id for the dynptr to associate the data slice with,
>       look for dynptr arg instead of assuming it is BPF_REG_1.
>
> 6/7 -
>     * Add __force when casting from unsigned long to void * (kernel test robot)
>     * Expand on docs for ringbuf dynptr APIs (Andrii)
>
> 7/7 -
>     * Use table approach for defining test programs and error messages (Andrii)
>     * Print out full log if there’s an error (Andrii)
>     * Use bpf_object__find_program_by_name() instead of specifying
>       program name as a string (Andrii)
>     * Add 6 extra cases: invalid_nested_dynptrs1, invalid_nested_dynptrs2,
>       invalid_ref_mem1, invalid_ref_mem2, zero_slice_access,
>       and test_alloc_zero_bytes
>     * Add checking for edge cases (eg allocing with invalid flags)
>
> Joanne Koong (7):
>   bpf: Add MEM_UNINIT as a bpf_type_flag
>   bpf: Add OBJ_RELEASE as a bpf_type_flag
>   bpf: Add bpf_dynptr_from_mem, bpf_dynptr_alloc, bpf_dynptr_put
>   bpf: Add bpf_dynptr_read and bpf_dynptr_write
>   bpf: Add dynptr data slices
>   bpf: Dynptr support for ring buffers
>   bpf: Dynptr tests
>
>  include/linux/bpf.h                           | 109 ++-
>  include/linux/bpf_verifier.h                  |  33 +-
>  include/uapi/linux/bpf.h                      | 110 +++
>  kernel/bpf/bpf_lsm.c                          |   4 +-
>  kernel/bpf/btf.c                              |   3 +-
>  kernel/bpf/cgroup.c                           |   4 +-
>  kernel/bpf/helpers.c                          | 212 +++++-
>  kernel/bpf/ringbuf.c                          |  75 +-
>  kernel/bpf/stackmap.c                         |   6 +-
>  kernel/bpf/verifier.c                         | 538 +++++++++++++--
>  kernel/trace/bpf_trace.c                      |  30 +-
>  net/core/filter.c                             |  28 +-
>  scripts/bpf_doc.py                            |   2 +
>  tools/include/uapi/linux/bpf.h                | 110 +++
>  .../testing/selftests/bpf/prog_tests/dynptr.c | 138 ++++
>  .../testing/selftests/bpf/progs/dynptr_fail.c | 643 ++++++++++++++++++
>  .../selftests/bpf/progs/dynptr_success.c      | 217 ++++++
>  17 files changed, 2148 insertions(+), 114 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/dynptr.c
>  create mode 100644 tools/testing/selftests/bpf/progs/dynptr_fail.c
>  create mode 100644 tools/testing/selftests/bpf/progs/dynptr_success.c
>
> --
> 2.30.2
>

--
Kartikeya

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v2 0/7] Dynamic pointers
  2022-04-16  8:13 ` [PATCH bpf-next v2 0/7] Dynamic pointers Kumar Kartikeya Dwivedi
@ 2022-04-16  8:19   ` Kumar Kartikeya Dwivedi
  2022-04-18 16:40     ` Joanne Koong
  0 siblings, 1 reply; 27+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-04-16  8:19 UTC (permalink / raw)
  To: Joanne Koong; +Cc: bpf, andrii, ast, daniel, toke

On Sat, Apr 16, 2022 at 01:43:41PM IST, Kumar Kartikeya Dwivedi wrote:
> On Sat, Apr 16, 2022 at 12:04:22PM IST, Joanne Koong wrote:
> > This patchset implements the basics of dynamic pointers in bpf.
> >
> > A dynamic pointer (struct bpf_dynptr) is a pointer that stores extra metadata
> > alongside the address it points to. This abstraction is useful in bpf, given
> > that every memory access in a bpf program must be safe. The verifier and bpf
> > helper functions can use the metadata to enforce safety guarantees for things
> > such as dynamically sized strings and kernel heap allocations.
> >
> > From the program side, the bpf_dynptr is an opaque struct and the verifier
> > will enforce that its contents are never written to by the program.
> > It can only be written to through specific bpf helper functions.
> >
> > There are several uses cases for dynamic pointers in bpf programs. A list of
> > some are: dynamically sized ringbuf reservations without any extra memcpys,
> > dynamic string parsing and memory comparisons, dynamic memory allocations that
> > can be persisted in a map, and dynamic parsing of sk_buff and xdp_md packet
> > data.
> >
> > At a high-level, the patches are as follows:
> > 1/7 - Adds MEM_UNINIT as a bpf_type_flag
> > 2/7 - Adds MEM_RELEASE as a bpf_type_flag
> > 3/7 - Adds bpf_dynptr_from_mem, bpf_dynptr_alloc, and bpf_dynptr_put
> > 4/7 - Adds bpf_dynptr_read and bpf_dynptr_write
> > 5/7 - Adds dynptr data slices (ptr to underlying dynptr memory)
> > 6/7 - Adds dynptr support for ring buffers
> > 7/7 - Tests to check that verifier rejects certain fail cases and passes
> > certain success cases
> >
> > This is the first dynptr patchset in a larger series. The next series of
> > patches will add persisting dynamic memory allocations in maps, parsing packet
> > data through dynptrs, dynptrs to referenced objects, convenience helpers for
> > using dynptrs as iterators, and more helper functions for interacting with
> > strings and memory dynamically.
> >
>
> test_verifier has 5 failed tests, the following diff fixes them (three for
> changed verifier error string, and two because we missed to do offset checks for
> ARG_PTR_TO_ALLOC_MEM in check_func_arg_reg_off). Since this is all, I guess you
> can wait for the review to complete for this version before respinning.
>

Ugh, hit send too early.

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index bf64946ced84..24e5d494d991 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5681,7 +5681,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
 		/* Some of the argument types nevertheless require a
 		 * zero register offset.
 		 */
-		if (arg_type != ARG_PTR_TO_ALLOC_MEM)
+		if (base_type(arg_type) != ARG_PTR_TO_ALLOC_MEM)
 			return 0;
 		break;
 	/* All the rest must be rejected, except PTR_TO_BTF_ID which allows
diff --git a/tools/testing/selftests/bpf/verifier/ref_tracking.c b/tools/testing/selftests/bpf/verifier/ref_tracking.c
index fbd682520e47..f1ad3b3cc145 100644
--- a/tools/testing/selftests/bpf/verifier/ref_tracking.c
+++ b/tools/testing/selftests/bpf/verifier/ref_tracking.c
@@ -796,7 +796,7 @@
 	},
 	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	.result = REJECT,
-	.errstr = "reference has not been acquired before",
+	.errstr = "arg 1 is an unacquired reference",
 },
 {
 	/* !bpf_sk_fullsock(sk) is checked but !bpf_tcp_sock(sk) is not checked */
diff --git a/tools/testing/selftests/bpf/verifier/sock.c b/tools/testing/selftests/bpf/verifier/sock.c
index 86b24cad27a7..055a61205906 100644
--- a/tools/testing/selftests/bpf/verifier/sock.c
+++ b/tools/testing/selftests/bpf/verifier/sock.c
@@ -417,7 +417,7 @@
 	},
 	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	.result = REJECT,
-	.errstr = "reference has not been acquired before",
+	.errstr = "arg 1 is an unacquired reference",
 },
 {
 	"bpf_sk_release(bpf_sk_fullsock(skb->sk))",
@@ -436,7 +436,7 @@
 	},
 	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	.result = REJECT,
-	.errstr = "reference has not been acquired before",
+	.errstr = "arg 1 is an unacquired reference",
 },
 {
 	"bpf_sk_release(bpf_tcp_sock(skb->sk))",
@@ -455,7 +455,7 @@
 	},
 	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	.result = REJECT,
-	.errstr = "reference has not been acquired before",
+	.errstr = "arg 1 is an unacquired reference",
 },
 {
 	"sk_storage_get(map, skb->sk, NULL, 0): value == NULL",

> [...]

--
Kartikeya

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v2 3/7] bpf: Add bpf_dynptr_from_mem, bpf_dynptr_alloc, bpf_dynptr_put
  2022-04-16  6:34 ` [PATCH bpf-next v2 3/7] bpf: Add bpf_dynptr_from_mem, bpf_dynptr_alloc, bpf_dynptr_put Joanne Koong
@ 2022-04-16 17:42   ` Kumar Kartikeya Dwivedi
  2022-04-18 22:20     ` Joanne Koong
  2022-04-19 20:35   ` Kumar Kartikeya Dwivedi
  2022-04-22  2:52   ` Alexei Starovoitov
  2 siblings, 1 reply; 27+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-04-16 17:42 UTC (permalink / raw)
  To: Joanne Koong; +Cc: bpf, andrii, ast, daniel, toke

On Sat, Apr 16, 2022 at 12:04:25PM IST, Joanne Koong wrote:
> This patch adds 3 new APIs and the bulk of the verifier work for
> supporting dynamic pointers in bpf.
>
> There are different types of dynptrs. This patch starts with the most
> basic ones, ones that reference a program's local memory
> (eg a stack variable) and ones that reference memory that is dynamically
> allocated on behalf of the program. If the memory is dynamically
> allocated by the program, the program *must* free it before the program
> exits. This is enforced by the verifier.
>
> The added APIs are:
>
> long bpf_dynptr_from_mem(void *data, u32 size, u64 flags, struct bpf_dynptr *ptr);
> long bpf_dynptr_alloc(u32 size, u64 flags, struct bpf_dynptr *ptr);
> void bpf_dynptr_put(struct bpf_dynptr *ptr);
>
> This patch sets up the verifier to support dynptrs. Dynptrs will always
> reside on the program's stack frame. As such, their state is tracked
> in their corresponding stack slot, which includes the type of dynptr
> (DYNPTR_LOCAL vs. DYNPTR_MALLOC).
>
> When the program passes in an uninitialized dynptr (ARG_PTR_TO_DYNPTR |
> MEM_UNINIT), the stack slots corresponding to the frame pointer
> where the dynptr resides at are marked as STACK_DYNPTR. For helper functions
> that take in initialized dynptrs (such as the next patch in this series
> which supports dynptr reads/writes), the verifier enforces that the
> dynptr has been initialized by checking that their corresponding stack
> slots have been marked as STACK_DYNPTR. Dynptr release functions
> (eg bpf_dynptr_put) will clear the stack slots. The verifier enforces at
> program exit that there are no acquired dynptr stack slots that need
> to be released.
>
> There are other constraints that are enforced by the verifier as
> well, such as that the dynptr cannot be written to directly by the bpf
> program or by non-dynptr helper functions. The last patch in this series
> contains tests that trigger different cases that the verifier needs to
> successfully reject.
>
> For now, local dynptrs cannot point to referenced memory since the
> memory can be freed anytime. Support for this will be added as part
> of a separate patchset.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>  include/linux/bpf.h            |  68 +++++-
>  include/linux/bpf_verifier.h   |  28 +++
>  include/uapi/linux/bpf.h       |  44 ++++
>  kernel/bpf/helpers.c           | 110 ++++++++++
>  kernel/bpf/verifier.c          | 372 +++++++++++++++++++++++++++++++--
>  scripts/bpf_doc.py             |   2 +
>  tools/include/uapi/linux/bpf.h |  44 ++++
>  7 files changed, 654 insertions(+), 14 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 29964cdb1dd6..fee91b07ee74 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -346,7 +346,16 @@ enum bpf_type_flag {
>
>  	OBJ_RELEASE		= BIT(6 + BPF_BASE_TYPE_BITS),
>
> -	__BPF_TYPE_LAST_FLAG	= OBJ_RELEASE,
> +	/* DYNPTR points to a program's local memory (eg stack variable). */
> +	DYNPTR_TYPE_LOCAL	= BIT(7 + BPF_BASE_TYPE_BITS),
> +
> +	/* DYNPTR points to dynamically allocated memory. */
> +	DYNPTR_TYPE_MALLOC	= BIT(8 + BPF_BASE_TYPE_BITS),
> +
> +	/* May not be a referenced object */
> +	NO_OBJ_REF		= BIT(9 + BPF_BASE_TYPE_BITS),
> +
> +	__BPF_TYPE_LAST_FLAG	= NO_OBJ_REF,
>  };
>
>  /* Max number of base types. */
> @@ -390,6 +399,7 @@ enum bpf_arg_type {
>  	ARG_PTR_TO_STACK,	/* pointer to stack */
>  	ARG_PTR_TO_CONST_STR,	/* pointer to a null terminated read-only string */
>  	ARG_PTR_TO_TIMER,	/* pointer to bpf_timer */
> +	ARG_PTR_TO_DYNPTR,      /* pointer to bpf_dynptr. See bpf_type_flag for dynptr type */
>  	__BPF_ARG_TYPE_MAX,
>
>  	/* Extended arg_types. */
> @@ -2394,4 +2404,60 @@ int bpf_bprintf_prepare(char *fmt, u32 fmt_size, const u64 *raw_args,
>  			u32 **bin_buf, u32 num_args);
>  void bpf_bprintf_cleanup(void);
>
> +/* the implementation of the opaque uapi struct bpf_dynptr */
> +struct bpf_dynptr_kern {
> +	void *data;
> +	/* Size represents the number of usable bytes in the dynptr.
> +	 * If for example the offset is at 200 for a malloc dynptr with
> +	 * allocation size 256, the number of usable bytes is 56.
> +	 *
> +	 * The upper 8 bits are reserved.
> +	 * Bit 31 denotes whether the dynptr is read-only.
> +	 * Bits 28-30 denote the dynptr type.
> +	 */
> +	u32 size;
> +	u32 offset;
> +} __aligned(8);
> +
> +enum bpf_dynptr_type {
> +	BPF_DYNPTR_TYPE_INVALID,
> +	/* Local memory used by the bpf program (eg stack variable) */
> +	BPF_DYNPTR_TYPE_LOCAL,
> +	/* Memory allocated dynamically by the kernel for the dynptr */
> +	BPF_DYNPTR_TYPE_MALLOC,
> +};
> +
> +/* Since the upper 8 bits of dynptr->size is reserved, the
> + * maximum supported size is 2^24 - 1.
> + */
> +#define DYNPTR_MAX_SIZE	((1UL << 24) - 1)
> +#define DYNPTR_SIZE_MASK	0xFFFFFF
> +#define DYNPTR_TYPE_SHIFT	28
> +#define DYNPTR_TYPE_MASK	0x7
> +
> +static inline enum bpf_dynptr_type bpf_dynptr_get_type(struct bpf_dynptr_kern *ptr)
> +{
> +	return (ptr->size >> DYNPTR_TYPE_SHIFT) & DYNPTR_TYPE_MASK;
> +}
> +
> +static inline void bpf_dynptr_set_type(struct bpf_dynptr_kern *ptr, enum bpf_dynptr_type type)
> +{
> +	ptr->size |= type << DYNPTR_TYPE_SHIFT;
> +}
> +
> +static inline u32 bpf_dynptr_get_size(struct bpf_dynptr_kern *ptr)
> +{
> +	return ptr->size & DYNPTR_SIZE_MASK;
> +}
> +
> +static inline int bpf_dynptr_check_size(u32 size)
> +{
> +	return size > DYNPTR_MAX_SIZE ? -E2BIG : 0;
> +}
> +
> +void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data, enum bpf_dynptr_type type,
> +		     u32 offset, u32 size);
> +
> +void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
> +
>  #endif /* _LINUX_BPF_H */
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 7a01adc9e13f..e11440a44e92 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -72,6 +72,27 @@ struct bpf_reg_state {
>
>  		u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */
>
> +		/* For dynptr stack slots */
> +		struct {
> +			enum bpf_dynptr_type type;
> +			/* A dynptr is 16 bytes so it takes up 2 stack slots.
> +			 * We need to track which slot is the first slot
> +			 * to protect against cases where the user may try to
> +			 * pass in an address starting at the second slot of the
> +			 * dynptr.
> +			 */
> +			bool first_slot;
> +		} dynptr;
> +		/* For stack slots that a local dynptr points to. We need to track
> +		 * this to prohibit programs from using stack variables that are
> +		 * pointed to by dynptrs as a dynptr, eg something like
> +		 *
> +		 * bpf_dyntpr_from_mem(&ptr, sizeof(ptr), 0, &local);
> +		 * bpf_dynptr_alloc(16, 0, &ptr);
> +		 * bpf_dynptr_write(&local, 0, corrupt_data, sizeof(ptr));
> +		 */
> +		bool is_dynptr_data;
> +
>  		/* Max size from any of the above. */
>  		struct {
>  			unsigned long raw1;
> @@ -174,9 +195,16 @@ enum bpf_stack_slot_type {
>  	STACK_SPILL,      /* register spilled into stack */
>  	STACK_MISC,	  /* BPF program wrote some data into this slot */
>  	STACK_ZERO,	  /* BPF program wrote constant zero */
> +	/* A dynptr is stored in this stack slot. The type of dynptr
> +	 * is stored in bpf_stack_state->spilled_ptr.dynptr.type
> +	 */
> +	STACK_DYNPTR,
>  };
>
>  #define BPF_REG_SIZE 8	/* size of eBPF register in bytes */
> +/* size of a struct bpf_dynptr in bytes */
> +#define BPF_DYNPTR_SIZE sizeof(struct bpf_dynptr_kern)
> +#define BPF_DYNPTR_NR_SLOTS (BPF_DYNPTR_SIZE / BPF_REG_SIZE)
>
>  struct bpf_stack_state {
>  	struct bpf_reg_state spilled_ptr;
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index d14b10b85e51..e339b2697d9a 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5143,6 +5143,42 @@ union bpf_attr {
>   *		The **hash_algo** is returned on success,
>   *		**-EOPNOTSUP** if the hash calculation failed or **-EINVAL** if
>   *		invalid arguments are passed.
> + *
> + * long bpf_dynptr_from_mem(void *data, u32 size, u64 flags, struct bpf_dynptr *ptr)
> + *	Description
> + *		Get a dynptr to local memory *data*.
> + *
> + *		For a dynptr to a dynamic memory allocation, please use
> + *		bpf_dynptr_alloc instead.
> + *
> + *		The maximum *size* supported is DYNPTR_MAX_SIZE.
> + *		*flags* is currently unused.
> + *	Return
> + *		0 on success, -E2BIG if the size exceeds DYNPTR_MAX_SIZE,
> + *		-EINVAL if flags is not 0.
> + *
> + * long bpf_dynptr_alloc(u32 size, u64 flags, struct bpf_dynptr *ptr)
> + *	Description
> + *		Allocate memory of *size* bytes.
> + *
> + *		Every call to bpf_dynptr_alloc must have a corresponding
> + *		bpf_dynptr_put, regardless of whether the bpf_dynptr_alloc
> + *		succeeded.
> + *
> + *		The maximum *size* supported is DYNPTR_MAX_SIZE.
> + *		Supported *flags* are __GFP_ZERO.
> + *	Return
> + *		0 on success, -ENOMEM if there is not enough memory for the
> + *		allocation, -E2BIG if the size exceeds DYNPTR_MAX_SIZE, -EINVAL
> + *		if the flags is not supported.
> + *
> + * void bpf_dynptr_put(struct bpf_dynptr *ptr)
> + *	Description
> + *		Free memory allocated by bpf_dynptr_alloc.
> + *
> + *		After this operation, *ptr* will be an invalidated dynptr.
> + *	Return
> + *		Void.
>   */
>  #define __BPF_FUNC_MAPPER(FN)		\
>  	FN(unspec),			\
> @@ -5339,6 +5375,9 @@ union bpf_attr {
>  	FN(copy_from_user_task),	\
>  	FN(skb_set_tstamp),		\
>  	FN(ima_file_hash),		\
> +	FN(dynptr_from_mem),		\
> +	FN(dynptr_alloc),		\
> +	FN(dynptr_put),			\
>  	/* */
>
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> @@ -6486,6 +6525,11 @@ struct bpf_timer {
>  	__u64 :64;
>  } __attribute__((aligned(8)));
>
> +struct bpf_dynptr {
> +	__u64 :64;
> +	__u64 :64;
> +} __attribute__((aligned(8)));
> +
>  struct bpf_sysctl {
>  	__u32	write;		/* Sysctl is being read (= 0) or written (= 1).
>  				 * Allows 1,2,4-byte read, but no write.
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index a47aae5c7335..87c14edda315 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -1374,6 +1374,110 @@ void bpf_timer_cancel_and_free(void *val)
>  	kfree(t);
>  }
>
> +void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data, enum bpf_dynptr_type type,
> +		     u32 offset, u32 size)
> +{
> +	ptr->data = data;
> +	ptr->offset = offset;
> +	ptr->size = size;
> +	bpf_dynptr_set_type(ptr, type);
> +}
> +
> +void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr)
> +{
> +	memset(ptr, 0, sizeof(*ptr));
> +}
> +
> +BPF_CALL_4(bpf_dynptr_from_mem, void *, data, u32, size, u64, flags, struct bpf_dynptr_kern *, ptr)
> +{
> +	int err;
> +
> +	err = bpf_dynptr_check_size(size);
> +	if (err)
> +		goto error;
> +
> +	/* flags is currently unsupported */
> +	if (flags) {
> +		err = -EINVAL;
> +		goto error;
> +	}
> +
> +	bpf_dynptr_init(ptr, data, BPF_DYNPTR_TYPE_LOCAL, 0, size);
> +
> +	return 0;
> +
> +error:
> +	bpf_dynptr_set_null(ptr);
> +	return err;
> +}
> +
> +const struct bpf_func_proto bpf_dynptr_from_mem_proto = {
> +	.func		= bpf_dynptr_from_mem,
> +	.gpl_only	= false,
> +	.ret_type	= RET_INTEGER,
> +	.arg1_type	= ARG_PTR_TO_MEM_UNINIT | NO_OBJ_REF,
> +	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
> +	.arg3_type	= ARG_ANYTHING,
> +	.arg4_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL | MEM_UNINIT,
> +};
> +
> +BPF_CALL_3(bpf_dynptr_alloc, u32, size, u64, flags, struct bpf_dynptr_kern *, ptr)
> +{
> +	gfp_t gfp_flags = GFP_ATOMIC;
> +	void *data;
> +	int err;
> +
> +	err = bpf_dynptr_check_size(size);
> +	if (err)
> +		goto error;
> +
> +	if (flags) {
> +		if (flags == __GFP_ZERO) {
> +			gfp_flags |= flags;
> +		} else {
> +			err = -EINVAL;
> +			goto error;
> +		}
> +	}
> +
> +	data = kmalloc(size, gfp_flags);
> +	if (!data) {
> +		err = -ENOMEM;
> +		goto error;
> +	}
> +
> +	bpf_dynptr_init(ptr, data, BPF_DYNPTR_TYPE_MALLOC, 0, size);
> +
> +	return 0;
> +
> +error:
> +	bpf_dynptr_set_null(ptr);
> +	return err;
> +}
> +
> +const struct bpf_func_proto bpf_dynptr_alloc_proto = {
> +	.func		= bpf_dynptr_alloc,
> +	.gpl_only	= false,
> +	.ret_type	= RET_INTEGER,
> +	.arg1_type	= ARG_ANYTHING,
> +	.arg2_type	= ARG_ANYTHING,
> +	.arg3_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_MALLOC | MEM_UNINIT,
> +};
> +
> +BPF_CALL_1(bpf_dynptr_put, struct bpf_dynptr_kern *, dynptr)
> +{
> +	kfree(dynptr->data);
> +	bpf_dynptr_set_null(dynptr);
> +	return 0;
> +}
> +
> +const struct bpf_func_proto bpf_dynptr_put_proto = {
> +	.func		= bpf_dynptr_put,
> +	.gpl_only	= false,
> +	.ret_type	= RET_VOID,
> +	.arg1_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_MALLOC | OBJ_RELEASE,
> +};
> +
>  const struct bpf_func_proto bpf_get_current_task_proto __weak;
>  const struct bpf_func_proto bpf_get_current_task_btf_proto __weak;
>  const struct bpf_func_proto bpf_probe_read_user_proto __weak;
> @@ -1426,6 +1530,12 @@ bpf_base_func_proto(enum bpf_func_id func_id)
>  		return &bpf_loop_proto;
>  	case BPF_FUNC_strncmp:
>  		return &bpf_strncmp_proto;
> +	case BPF_FUNC_dynptr_from_mem:
> +		return &bpf_dynptr_from_mem_proto;
> +	case BPF_FUNC_dynptr_alloc:
> +		return &bpf_dynptr_alloc_proto;
> +	case BPF_FUNC_dynptr_put:
> +		return &bpf_dynptr_put_proto;
>  	default:
>  		break;
>  	}
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 8deb588a19ce..bf132c6822e4 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -187,6 +187,9 @@ struct bpf_verifier_stack_elem {
>  					  POISON_POINTER_DELTA))
>  #define BPF_MAP_PTR(X)		((struct bpf_map *)((X) & ~BPF_MAP_PTR_UNPRIV))
>
> +/* forward declarations */
> +static bool arg_type_is_mem_size(enum bpf_arg_type type);
> +
>  static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux)
>  {
>  	return BPF_MAP_PTR(aux->map_ptr_state) == BPF_MAP_PTR_POISON;
> @@ -257,7 +260,9 @@ struct bpf_call_arg_meta {
>  	struct btf *ret_btf;
>  	u32 ret_btf_id;
>  	u32 subprogno;
> -	bool release_ref;
> +	u8 release_regno;
> +	bool release_dynptr;
> +	u8 uninit_dynptr_regno;
>  };
>
>  struct btf *btf_vmlinux;
> @@ -576,6 +581,7 @@ static char slot_type_char[] = {
>  	[STACK_SPILL]	= 'r',
>  	[STACK_MISC]	= 'm',
>  	[STACK_ZERO]	= '0',
> +	[STACK_DYNPTR]	= 'd',
>  };
>
>  static void print_liveness(struct bpf_verifier_env *env,
> @@ -591,6 +597,25 @@ static void print_liveness(struct bpf_verifier_env *env,
>  		verbose(env, "D");
>  }
>
> +static inline int get_spi(s32 off)
> +{
> +	return (-off - 1) / BPF_REG_SIZE;
> +}
> +
> +static bool is_spi_bounds_valid(struct bpf_func_state *state, int spi, u32 nr_slots)
> +{
> +	int allocated_slots = state->allocated_stack / BPF_REG_SIZE;
> +
> +	/* We need to check that slots between [spi - nr_slots + 1, spi] are
> +	 * within [0, allocated_stack).
> +	 *
> +	 * Please note that the spi grows downwards. For example, a dynptr
> +	 * takes the size of two stack slots; the first slot will be at
> +	 * spi and the second slot will be at spi - 1.
> +	 */
> +	return spi - nr_slots + 1 >= 0 && spi < allocated_slots;
> +}
> +
>  static struct bpf_func_state *func(struct bpf_verifier_env *env,
>  				   const struct bpf_reg_state *reg)
>  {
> @@ -642,6 +667,191 @@ static void mark_verifier_state_scratched(struct bpf_verifier_env *env)
>  	env->scratched_stack_slots = ~0ULL;
>  }
>
> +#define DYNPTR_TYPE_FLAG_MASK (DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_MALLOC)
> +
> +static int arg_to_dynptr_type(enum bpf_arg_type arg_type)
> +{
> +	switch (arg_type & DYNPTR_TYPE_FLAG_MASK) {
> +	case DYNPTR_TYPE_LOCAL:
> +		return BPF_DYNPTR_TYPE_LOCAL;
> +	case DYNPTR_TYPE_MALLOC:
> +		return BPF_DYNPTR_TYPE_MALLOC;
> +	default:
> +		return BPF_DYNPTR_TYPE_INVALID;
> +	}
> +}
> +
> +static inline bool dynptr_type_refcounted(enum bpf_dynptr_type type)
> +{
> +	return type == BPF_DYNPTR_TYPE_MALLOC;
> +}
> +
> +static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> +				   enum bpf_arg_type arg_type)
> +{
> +	struct bpf_func_state *state = cur_func(env);
> +	enum bpf_dynptr_type type;
> +	int spi, i;
> +
> +	spi = get_spi(reg->off);
> +
> +	if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS))
> +		return -EINVAL;
> +
> +	type = arg_to_dynptr_type(arg_type);
> +	if (type == BPF_DYNPTR_TYPE_INVALID)
> +		return -EINVAL;
> +
> +	for (i = 0; i < BPF_REG_SIZE; i++) {
> +		state->stack[spi].slot_type[i] = STACK_DYNPTR;
> +		state->stack[spi - 1].slot_type[i] = STACK_DYNPTR;
> +	}
> +
> +	state->stack[spi].spilled_ptr.dynptr.type = type;
> +	state->stack[spi - 1].spilled_ptr.dynptr.type = type;
> +
> +	state->stack[spi].spilled_ptr.dynptr.first_slot = true;
> +
> +	return 0;
> +}
> +
> +static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> +{
> +	struct bpf_func_state *state = func(env, reg);
> +	int spi, i;
> +
> +	spi = get_spi(reg->off);
> +
> +	if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS))
> +		return -EINVAL;
> +
> +	for (i = 0; i < BPF_REG_SIZE; i++) {
> +		state->stack[spi].slot_type[i] = STACK_INVALID;
> +		state->stack[spi - 1].slot_type[i] = STACK_INVALID;
> +	}
> +
> +	state->stack[spi].spilled_ptr.dynptr.type = 0;
> +	state->stack[spi - 1].spilled_ptr.dynptr.type = 0;
> +
> +	state->stack[spi].spilled_ptr.dynptr.first_slot = 0;
> +
> +	return 0;
> +}
> +
> +static int mark_as_dynptr_data(struct bpf_verifier_env *env, const struct bpf_func_proto *fn,
> +			       struct bpf_reg_state *regs)
> +{
> +	struct bpf_func_state *state = cur_func(env);
> +	struct bpf_reg_state *reg, *mem_reg = NULL;
> +	enum bpf_arg_type arg_type;
> +	u64 mem_size;
> +	u32 nr_slots;
> +	int i, spi;
> +
> +	/* We must protect against the case where a program tries to do something
> +	 * like this:
> +	 *
> +	 * bpf_dynptr_from_mem(&ptr, sizeof(ptr), 0, &local);
> +	 * bpf_dynptr_alloc(16, 0, &ptr);
> +	 * bpf_dynptr_write(&local, 0, corrupt_data, sizeof(ptr));
> +	 *
> +	 * If ptr is a variable on the stack, we must mark the stack slot as
> +	 * dynptr data when a local dynptr to it is created.
> +	 */
> +	for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
> +		arg_type = fn->arg_type[i];
> +		reg = &regs[BPF_REG_1 + i];
> +
> +		if (base_type(arg_type) == ARG_PTR_TO_MEM) {
> +			if (base_type(reg->type) == PTR_TO_STACK) {
> +				mem_reg = reg;
> +				continue;
> +			}
> +			/* if it's not a PTR_TO_STACK, then we don't need to
> +			 * mark anything since it can never be used as a dynptr.
> +			 * We can just return here since there will always be
> +			 * only one ARG_PTR_TO_MEM in fn.
> +			 */
> +			return 0;
> +		} else if (arg_type_is_mem_size(arg_type)) {
> +			mem_size = roundup(reg->var_off.value, BPF_REG_SIZE);
> +		}
> +	}
> +
> +	if (!mem_reg || !mem_size) {
> +		verbose(env, "verifier internal error: invalid ARG_PTR_TO_MEM args for %s\n", __func__);
> +		return -EFAULT;
> +	}
> +
> +	spi = get_spi(mem_reg->off);
> +	if (!is_spi_bounds_valid(state, spi, mem_size)) {
> +		verbose(env, "verifier internal error: variable not initialized on stack in %s\n", __func__);
> +		return -EFAULT;
> +	}
> +
> +	nr_slots = mem_size / BPF_REG_SIZE;
> +	for (i = 0; i < nr_slots; i++)
> +		state->stack[spi - i].spilled_ptr.is_dynptr_data = true;
> +
> +	return 0;
> +}
> +
> +static bool is_dynptr_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> +				       bool *is_dynptr_data)
> +{
> +	struct bpf_func_state *state = func(env, reg);
> +	int spi;
> +
> +	spi = get_spi(reg->off);
> +
> +	if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS))
> +		return true;
> +
> +	if (state->stack[spi].slot_type[0] == STACK_DYNPTR ||
> +	    state->stack[spi - 1].slot_type[0] == STACK_DYNPTR)
> +		return false;
> +
> +	if (state->stack[spi].spilled_ptr.is_dynptr_data ||
> +	    state->stack[spi - 1].spilled_ptr.is_dynptr_data) {
> +		*is_dynptr_data = true;
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +static bool is_dynptr_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> +				     enum bpf_arg_type arg_type)
> +{
> +	struct bpf_func_state *state = func(env, reg);
> +	int spi = get_spi(reg->off);
> +
> +	if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS) ||
> +	    state->stack[spi].slot_type[0] != STACK_DYNPTR ||
> +	    state->stack[spi - 1].slot_type[0] != STACK_DYNPTR ||
> +	    !state->stack[spi].spilled_ptr.dynptr.first_slot)
> +		return false;
> +
> +	/* ARG_PTR_TO_DYNPTR takes any type of dynptr */
> +	if (arg_type == ARG_PTR_TO_DYNPTR)
> +		return true;
> +
> +	return state->stack[spi].spilled_ptr.dynptr.type == arg_to_dynptr_type(arg_type);
> +}
> +
> +static bool stack_access_into_dynptr(struct bpf_func_state *state, int spi, int size)
> +{
> +	int nr_slots = roundup(size, BPF_REG_SIZE) / BPF_REG_SIZE;
> +	int i;
> +
> +	for (i = 0; i < nr_slots; i++) {
> +		if (state->stack[spi - i].slot_type[0] == STACK_DYNPTR)
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
>  /* The reg state of a pointer or a bounded scalar was saved when
>   * it was spilled to the stack.
>   */
> @@ -2878,6 +3088,12 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
>  	}
>
>  	mark_stack_slot_scratched(env, spi);
> +
> +	if (stack_access_into_dynptr(state, spi, size)) {
> +		verbose(env, "direct write into dynptr is not permitted\n");
> +		return -EINVAL;
> +	}
> +
>  	if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) &&
>  	    !register_is_null(reg) && env->bpf_capable) {
>  		if (dst_reg != BPF_REG_FP) {
> @@ -2999,6 +3215,12 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env,
>  		slot = -i - 1;
>  		spi = slot / BPF_REG_SIZE;
>  		stype = &state->stack[spi].slot_type[slot % BPF_REG_SIZE];
> +
> +		if (*stype == STACK_DYNPTR) {
> +			verbose(env, "direct write into dynptr is not permitted\n");
> +			return -EINVAL;
> +		}
> +
>  		mark_stack_slot_scratched(env, spi);
>
>  		if (!env->allow_ptr_leaks
> @@ -5141,6 +5363,16 @@ static bool arg_type_is_int_ptr(enum bpf_arg_type type)
>  	       type == ARG_PTR_TO_LONG;
>  }
>
> +static inline bool arg_type_is_dynptr(enum bpf_arg_type type)
> +{
> +	return base_type(type) == ARG_PTR_TO_DYNPTR;
> +}
> +
> +static inline bool arg_type_is_dynptr_uninit(enum bpf_arg_type type)
> +{
> +	return arg_type_is_dynptr(type) && (type & MEM_UNINIT);
> +}
> +
>  static int int_ptr_type_to_size(enum bpf_arg_type type)
>  {
>  	if (type == ARG_PTR_TO_INT)
> @@ -5278,6 +5510,7 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
>  	[ARG_PTR_TO_STACK]		= &stack_ptr_types,
>  	[ARG_PTR_TO_CONST_STR]		= &const_str_ptr_types,
>  	[ARG_PTR_TO_TIMER]		= &timer_types,
> +	[ARG_PTR_TO_DYNPTR]		= &stack_ptr_types,
>  };
>
>  static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
> @@ -5450,10 +5683,16 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
>  		return err;
>
>  skip_type_check:
> -	/* check_func_arg_reg_off relies on only one referenced register being
> -	 * allowed for BPF helpers.
> -	 */
>  	if (reg->ref_obj_id) {
> +		if (arg_type & NO_OBJ_REF) {
> +			verbose(env, "Arg #%d cannot be a referenced object\n",
> +				arg + 1);
> +			return -EINVAL;
> +		}
> +
> +		/* check_func_arg_reg_off relies on only one referenced register being
> +		 * allowed for BPF helpers.
> +		 */
>  		if (meta->ref_obj_id) {
>  			verbose(env, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n",
>  				regno, reg->ref_obj_id,
> @@ -5463,16 +5702,26 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
>  		meta->ref_obj_id = reg->ref_obj_id;
>  	}
>  	if (arg_type & OBJ_RELEASE) {
> -		if (!reg->ref_obj_id) {
> +		if (arg_type_is_dynptr(arg_type)) {
> +			struct bpf_func_state *state = func(env, reg);
> +			int spi = get_spi(reg->off);
> +
> +			if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS) ||
> +			    !state->stack[spi].spilled_ptr.id) {
> +				verbose(env, "arg %d is an unacquired reference\n", regno);
> +				return -EINVAL;
> +			}
> +			meta->release_dynptr = true;
> +		} else if (!reg->ref_obj_id) {
>  			verbose(env, "arg %d is an unacquired reference\n", regno);
>  			return -EINVAL;
>  		}
> -		if (meta->release_ref) {
> -			verbose(env, "verifier internal error: more than one release_ref arg R%d\n",
> -				regno);
> +		if (meta->release_regno) {
> +			verbose(env, "verifier internal error: more than one release_regno %u %u\n",
> +				meta->release_regno, regno);
>  			return -EFAULT;
>  		}
> -		meta->release_ref = true;
> +		meta->release_regno = regno;
>  	}
>
>  	if (arg_type == ARG_CONST_MAP_PTR) {
> @@ -5565,6 +5814,44 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
>  		bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
>
>  		err = check_mem_size_reg(env, reg, regno, zero_size_allowed, meta);
> +	} else if (arg_type_is_dynptr(arg_type)) {
> +		/* Can't pass in a dynptr at a weird offset */
> +		if (reg->off % BPF_REG_SIZE) {

In invalid_helper2 test, you are passing &dynptr + 8, which means reg will be
fp-8 (assuming dynptr is at top of stack), get_spi will compute spi as 0, so
spi-1 will lead to OOB access for the second dynptr stack slot. If you run the
dynptr test under KASAN, you should see a warning for this.

So we should ensure here that reg->off is atleast -16.

> +			verbose(env, "cannot pass in non-zero dynptr offset\n");
> +			return -EINVAL;
> +		}
> +
> +		if (arg_type & MEM_UNINIT)  {
> +			bool is_dynptr_data = false;
> +
> +			if (!is_dynptr_reg_valid_uninit(env, reg, &is_dynptr_data)) {
> +				if (is_dynptr_data)
> +					verbose(env, "Arg #%d cannot be a memory reference for another dynptr\n",
> +						arg + 1);
> +				else
> +					verbose(env, "Arg #%d dynptr has to be an uninitialized dynptr\n",
> +						arg + 1);
> +				return -EINVAL;
> +			}
> +
> +			meta->uninit_dynptr_regno = arg + BPF_REG_1;
> +		} else if (!is_dynptr_reg_valid_init(env, reg, arg_type)) {
> +			const char *err_extra = "";
> +
> +			switch (arg_type & DYNPTR_TYPE_FLAG_MASK) {
> +			case DYNPTR_TYPE_LOCAL:
> +				err_extra = "local ";
> +				break;
> +			case DYNPTR_TYPE_MALLOC:
> +				err_extra = "malloc ";
> +				break;
> +			default:
> +				break;
> +			}
> +			verbose(env, "Expected an initialized %sdynptr as arg #%d\n",
> +				err_extra, arg + 1);
> +			return -EINVAL;
> +		}
>  	} else if (arg_type_is_alloc_size(arg_type)) {
>  		if (!tnum_is_const(reg->var_off)) {
>  			verbose(env, "R%d is not a known constant'\n",
> @@ -6545,6 +6832,28 @@ static int check_reference_leak(struct bpf_verifier_env *env)
>  	return state->acquired_refs ? -EINVAL : 0;
>  }
>
> +/* Called at BPF_EXIT to detect if there are any reference-tracked dynptrs that have
> + * not been released. Dynptrs to local memory do not need to be released.
> + */
> +static int check_dynptr_unreleased(struct bpf_verifier_env *env)
> +{
> +	struct bpf_func_state *state = cur_func(env);
> +	int allocated_slots, i;
> +
> +	allocated_slots = state->allocated_stack / BPF_REG_SIZE;
> +
> +	for (i = 0; i < allocated_slots; i++) {
> +		if (state->stack[i].slot_type[0] == STACK_DYNPTR) {
> +			if (dynptr_type_refcounted(state->stack[i].spilled_ptr.dynptr.type)) {
> +				verbose(env, "spi=%d is an unreleased dynptr\n", i);
> +				return -EINVAL;
> +			}
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>  static int check_bpf_snprintf_call(struct bpf_verifier_env *env,
>  				   struct bpf_reg_state *regs)
>  {
> @@ -6686,8 +6995,38 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>  			return err;
>  	}
>
> -	if (meta.release_ref) {
> -		err = release_reference(env, meta.ref_obj_id);
> +	regs = cur_regs(env);
> +
> +	if (meta.uninit_dynptr_regno) {
> +		enum bpf_arg_type type;
> +
> +		/* we write BPF_W bits (4 bytes) at a time */
> +		for (i = 0; i < BPF_DYNPTR_SIZE; i += 4) {
> +			err = check_mem_access(env, insn_idx, meta.uninit_dynptr_regno,
> +					       i, BPF_W, BPF_WRITE, -1, false);
> +			if (err)
> +				return err;
> +		}
> +
> +		type = fn->arg_type[meta.uninit_dynptr_regno - BPF_REG_1];
> +
> +		err = mark_stack_slots_dynptr(env, &regs[meta.uninit_dynptr_regno], type);
> +		if (err)
> +			return err;
> +
> +		if (type & DYNPTR_TYPE_LOCAL) {
> +			err = mark_as_dynptr_data(env, fn, regs);
> +			if (err)
> +				return err;
> +		}
> +	}
> +
> +	if (meta.release_regno) {
> +		if (meta.release_dynptr) {
> +			err = unmark_stack_slots_dynptr(env, &regs[meta.release_regno]);
> +		} else {
> +			err = release_reference(env, meta.ref_obj_id);
> +		}
>  		if (err) {
>  			verbose(env, "func %s#%d reference has not been acquired before\n",
>  				func_id_name(func_id), func_id);
> @@ -6695,8 +7034,6 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>  		}
>  	}
>
> -	regs = cur_regs(env);
> -
>  	switch (func_id) {
>  	case BPF_FUNC_tail_call:
>  		err = check_reference_leak(env);
> @@ -6704,6 +7041,11 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>  			verbose(env, "tail_call would lead to reference leak\n");
>  			return err;
>  		}
> +		err = check_dynptr_unreleased(env);
> +		if (err) {
> +			verbose(env, "tail_call would lead to dynptr memory leak\n");
> +			return err;
> +		}
>  		break;
>  	case BPF_FUNC_get_local_storage:
>  		/* check that flags argument in get_local_storage(map, flags) is 0,
> @@ -11696,6 +12038,10 @@ static int do_check(struct bpf_verifier_env *env)
>  					return -EINVAL;
>  				}
>
> +				err = check_dynptr_unreleased(env);
> +				if (err)
> +					return err;
> +
>  				if (state->curframe) {
>  					/* exit from nested function */
>  					err = prepare_func_exit(env, &env->insn_idx);
> diff --git a/scripts/bpf_doc.py b/scripts/bpf_doc.py
> index 096625242475..766dcbc73897 100755
> --- a/scripts/bpf_doc.py
> +++ b/scripts/bpf_doc.py
> @@ -633,6 +633,7 @@ class PrinterHelpers(Printer):
>              'struct socket',
>              'struct file',
>              'struct bpf_timer',
> +            'struct bpf_dynptr',
>      ]
>      known_types = {
>              '...',
> @@ -682,6 +683,7 @@ class PrinterHelpers(Printer):
>              'struct socket',
>              'struct file',
>              'struct bpf_timer',
> +            'struct bpf_dynptr',
>      }
>      mapped_types = {
>              'u8': '__u8',
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index d14b10b85e51..e339b2697d9a 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -5143,6 +5143,42 @@ union bpf_attr {
>   *		The **hash_algo** is returned on success,
>   *		**-EOPNOTSUP** if the hash calculation failed or **-EINVAL** if
>   *		invalid arguments are passed.
> + *
> + * long bpf_dynptr_from_mem(void *data, u32 size, u64 flags, struct bpf_dynptr *ptr)
> + *	Description
> + *		Get a dynptr to local memory *data*.
> + *
> + *		For a dynptr to a dynamic memory allocation, please use
> + *		bpf_dynptr_alloc instead.
> + *
> + *		The maximum *size* supported is DYNPTR_MAX_SIZE.
> + *		*flags* is currently unused.
> + *	Return
> + *		0 on success, -E2BIG if the size exceeds DYNPTR_MAX_SIZE,
> + *		-EINVAL if flags is not 0.
> + *
> + * long bpf_dynptr_alloc(u32 size, u64 flags, struct bpf_dynptr *ptr)
> + *	Description
> + *		Allocate memory of *size* bytes.
> + *
> + *		Every call to bpf_dynptr_alloc must have a corresponding
> + *		bpf_dynptr_put, regardless of whether the bpf_dynptr_alloc
> + *		succeeded.
> + *
> + *		The maximum *size* supported is DYNPTR_MAX_SIZE.
> + *		Supported *flags* are __GFP_ZERO.
> + *	Return
> + *		0 on success, -ENOMEM if there is not enough memory for the
> + *		allocation, -E2BIG if the size exceeds DYNPTR_MAX_SIZE, -EINVAL
> + *		if the flags is not supported.
> + *
> + * void bpf_dynptr_put(struct bpf_dynptr *ptr)
> + *	Description
> + *		Free memory allocated by bpf_dynptr_alloc.
> + *
> + *		After this operation, *ptr* will be an invalidated dynptr.
> + *	Return
> + *		Void.
>   */
>  #define __BPF_FUNC_MAPPER(FN)		\
>  	FN(unspec),			\
> @@ -5339,6 +5375,9 @@ union bpf_attr {
>  	FN(copy_from_user_task),	\
>  	FN(skb_set_tstamp),		\
>  	FN(ima_file_hash),		\
> +	FN(dynptr_from_mem),		\
> +	FN(dynptr_alloc),		\
> +	FN(dynptr_put),			\
>  	/* */
>
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> @@ -6486,6 +6525,11 @@ struct bpf_timer {
>  	__u64 :64;
>  } __attribute__((aligned(8)));
>
> +struct bpf_dynptr {
> +	__u64 :64;
> +	__u64 :64;
> +} __attribute__((aligned(8)));
> +
>  struct bpf_sysctl {
>  	__u32	write;		/* Sysctl is being read (= 0) or written (= 1).
>  				 * Allows 1,2,4-byte read, but no write.
> --
> 2.30.2
>

--
Kartikeya

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v2 0/7] Dynamic pointers
  2022-04-16  8:19   ` Kumar Kartikeya Dwivedi
@ 2022-04-18 16:40     ` Joanne Koong
  0 siblings, 0 replies; 27+ messages in thread
From: Joanne Koong @ 2022-04-18 16:40 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Sat, Apr 16, 2022 at 1:19 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Sat, Apr 16, 2022 at 01:43:41PM IST, Kumar Kartikeya Dwivedi wrote:
> > On Sat, Apr 16, 2022 at 12:04:22PM IST, Joanne Koong wrote:
> > > This patchset implements the basics of dynamic pointers in bpf.
> > >
> > > A dynamic pointer (struct bpf_dynptr) is a pointer that stores extra metadata
> > > alongside the address it points to. This abstraction is useful in bpf, given
> > > that every memory access in a bpf program must be safe. The verifier and bpf
> > > helper functions can use the metadata to enforce safety guarantees for things
> > > such as dynamically sized strings and kernel heap allocations.
> > >
> > > From the program side, the bpf_dynptr is an opaque struct and the verifier
> > > will enforce that its contents are never written to by the program.
> > > It can only be written to through specific bpf helper functions.
> > >
> > > There are several uses cases for dynamic pointers in bpf programs. A list of
> > > some are: dynamically sized ringbuf reservations without any extra memcpys,
> > > dynamic string parsing and memory comparisons, dynamic memory allocations that
> > > can be persisted in a map, and dynamic parsing of sk_buff and xdp_md packet
> > > data.
> > >
> > > At a high-level, the patches are as follows:
> > > 1/7 - Adds MEM_UNINIT as a bpf_type_flag
> > > 2/7 - Adds MEM_RELEASE as a bpf_type_flag
> > > 3/7 - Adds bpf_dynptr_from_mem, bpf_dynptr_alloc, and bpf_dynptr_put
> > > 4/7 - Adds bpf_dynptr_read and bpf_dynptr_write
> > > 5/7 - Adds dynptr data slices (ptr to underlying dynptr memory)
> > > 6/7 - Adds dynptr support for ring buffers
> > > 7/7 - Tests to check that verifier rejects certain fail cases and passes
> > > certain success cases
> > >
> > > This is the first dynptr patchset in a larger series. The next series of
> > > patches will add persisting dynamic memory allocations in maps, parsing packet
> > > data through dynptrs, dynptrs to referenced objects, convenience helpers for
> > > using dynptrs as iterators, and more helper functions for interacting with
> > > strings and memory dynamically.
> > >
> >
> > test_verifier has 5 failed tests, the following diff fixes them (three for
> > changed verifier error string, and two because we missed to do offset checks for
> > ARG_PTR_TO_ALLOC_MEM in check_func_arg_reg_off). Since this is all, I guess you
> > can wait for the review to complete for this version before respinning.
> >
>
> Ugh, hit send too early.
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index bf64946ced84..24e5d494d991 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -5681,7 +5681,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>                 /* Some of the argument types nevertheless require a
>                  * zero register offset.
>                  */
> -               if (arg_type != ARG_PTR_TO_ALLOC_MEM)
> +               if (base_type(arg_type) != ARG_PTR_TO_ALLOC_MEM)
>                         return 0;
>                 break;
>         /* All the rest must be rejected, except PTR_TO_BTF_ID which allows
> diff --git a/tools/testing/selftests/bpf/verifier/ref_tracking.c b/tools/testing/selftests/bpf/verifier/ref_tracking.c
> index fbd682520e47..f1ad3b3cc145 100644
> --- a/tools/testing/selftests/bpf/verifier/ref_tracking.c
> +++ b/tools/testing/selftests/bpf/verifier/ref_tracking.c
> @@ -796,7 +796,7 @@
>         },
>         .prog_type = BPF_PROG_TYPE_SCHED_CLS,
>         .result = REJECT,
> -       .errstr = "reference has not been acquired before",
> +       .errstr = "arg 1 is an unacquired reference",
>  },
>  {
>         /* !bpf_sk_fullsock(sk) is checked but !bpf_tcp_sock(sk) is not checked */
> diff --git a/tools/testing/selftests/bpf/verifier/sock.c b/tools/testing/selftests/bpf/verifier/sock.c
> index 86b24cad27a7..055a61205906 100644
> --- a/tools/testing/selftests/bpf/verifier/sock.c
> +++ b/tools/testing/selftests/bpf/verifier/sock.c
> @@ -417,7 +417,7 @@
>         },
>         .prog_type = BPF_PROG_TYPE_SCHED_CLS,
>         .result = REJECT,
> -       .errstr = "reference has not been acquired before",
> +       .errstr = "arg 1 is an unacquired reference",
>  },
>  {
>         "bpf_sk_release(bpf_sk_fullsock(skb->sk))",
> @@ -436,7 +436,7 @@
>         },
>         .prog_type = BPF_PROG_TYPE_SCHED_CLS,
>         .result = REJECT,
> -       .errstr = "reference has not been acquired before",
> +       .errstr = "arg 1 is an unacquired reference",
>  },
>  {
>         "bpf_sk_release(bpf_tcp_sock(skb->sk))",
> @@ -455,7 +455,7 @@
>         },
>         .prog_type = BPF_PROG_TYPE_SCHED_CLS,
>         .result = REJECT,
> -       .errstr = "reference has not been acquired before",
> +       .errstr = "arg 1 is an unacquired reference",
>  },
>  {
>         "sk_storage_get(map, skb->sk, NULL, 0): value == NULL",
>
> > [...]
Awesome, thanks for noting this Kumar! I'll make sure to locally run
the verifier tests before I submit the next iteration of it upstream
>
> --
> Kartikeya

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v2 3/7] bpf: Add bpf_dynptr_from_mem, bpf_dynptr_alloc, bpf_dynptr_put
  2022-04-16 17:42   ` Kumar Kartikeya Dwivedi
@ 2022-04-18 22:20     ` Joanne Koong
  2022-04-18 23:57       ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 27+ messages in thread
From: Joanne Koong @ 2022-04-18 22:20 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

()On Sat, Apr 16, 2022 at 10:42 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Sat, Apr 16, 2022 at 12:04:25PM IST, Joanne Koong wrote:
> > This patch adds 3 new APIs and the bulk of the verifier work for
> > supporting dynamic pointers in bpf.
> >
> > There are different types of dynptrs. This patch starts with the most
> > basic ones, ones that reference a program's local memory
> > (eg a stack variable) and ones that reference memory that is dynamically
> > allocated on behalf of the program. If the memory is dynamically
> > allocated by the program, the program *must* free it before the program
> > exits. This is enforced by the verifier.
> >
> > The added APIs are:
> >
> > long bpf_dynptr_from_mem(void *data, u32 size, u64 flags, struct bpf_dynptr *ptr);
> > long bpf_dynptr_alloc(u32 size, u64 flags, struct bpf_dynptr *ptr);
> > void bpf_dynptr_put(struct bpf_dynptr *ptr);
> >
> > This patch sets up the verifier to support dynptrs. Dynptrs will always
> > reside on the program's stack frame. As such, their state is tracked
> > in their corresponding stack slot, which includes the type of dynptr
> > (DYNPTR_LOCAL vs. DYNPTR_MALLOC).
> >
> > When the program passes in an uninitialized dynptr (ARG_PTR_TO_DYNPTR |
> > MEM_UNINIT), the stack slots corresponding to the frame pointer
> > where the dynptr resides at are marked as STACK_DYNPTR. For helper functions
> > that take in initialized dynptrs (such as the next patch in this series
> > which supports dynptr reads/writes), the verifier enforces that the
> > dynptr has been initialized by checking that their corresponding stack
> > slots have been marked as STACK_DYNPTR. Dynptr release functions
> > (eg bpf_dynptr_put) will clear the stack slots. The verifier enforces at
> > program exit that there are no acquired dynptr stack slots that need
> > to be released.
> >
> > There are other constraints that are enforced by the verifier as
> > well, such as that the dynptr cannot be written to directly by the bpf
> > program or by non-dynptr helper functions. The last patch in this series
> > contains tests that trigger different cases that the verifier needs to
> > successfully reject.
> >
> > For now, local dynptrs cannot point to referenced memory since the
> > memory can be freed anytime. Support for this will be added as part
> > of a separate patchset.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
> >  include/linux/bpf.h            |  68 +++++-
> >  include/linux/bpf_verifier.h   |  28 +++
> >  include/uapi/linux/bpf.h       |  44 ++++
> >  kernel/bpf/helpers.c           | 110 ++++++++++
> >  kernel/bpf/verifier.c          | 372 +++++++++++++++++++++++++++++++--
> >  scripts/bpf_doc.py             |   2 +
> >  tools/include/uapi/linux/bpf.h |  44 ++++
> >  7 files changed, 654 insertions(+), 14 deletions(-)
> >
[...]
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 8deb588a19ce..bf132c6822e4 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -187,6 +187,9 @@ struct bpf_verifier_stack_elem {
> >                                         POISON_POINTER_DELTA))
> >  #define BPF_MAP_PTR(X)               ((struct bpf_map *)((X) & ~BPF_MAP_PTR_UNPRIV))
> >
> > +/* forward declarations */
> > +static bool arg_type_is_mem_size(enum bpf_arg_type type);
> > +
> >  static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux)
> >  {
> >       return BPF_MAP_PTR(aux->map_ptr_state) == BPF_MAP_PTR_POISON;
> > @@ -257,7 +260,9 @@ struct bpf_call_arg_meta {
> >       struct btf *ret_btf;
> >       u32 ret_btf_id;
> >       u32 subprogno;
> > -     bool release_ref;
> > +     u8 release_regno;
> > +     bool release_dynptr;
> > +     u8 uninit_dynptr_regno;
> >  };
> >
> >  struct btf *btf_vmlinux;
> > @@ -576,6 +581,7 @@ static char slot_type_char[] = {
> >       [STACK_SPILL]   = 'r',
> >       [STACK_MISC]    = 'm',
> >       [STACK_ZERO]    = '0',
> > +     [STACK_DYNPTR]  = 'd',
> >  };
> >
> >  static void print_liveness(struct bpf_verifier_env *env,
> > @@ -591,6 +597,25 @@ static void print_liveness(struct bpf_verifier_env *env,
> >               verbose(env, "D");
> >  }
> >
> > +static inline int get_spi(s32 off)
> > +{
> > +     return (-off - 1) / BPF_REG_SIZE;
> > +}
> > +
> > +static bool is_spi_bounds_valid(struct bpf_func_state *state, int spi, u32 nr_slots)
> > +{
> > +     int allocated_slots = state->allocated_stack / BPF_REG_SIZE;
> > +
> > +     /* We need to check that slots between [spi - nr_slots + 1, spi] are
> > +      * within [0, allocated_stack).
> > +      *
> > +      * Please note that the spi grows downwards. For example, a dynptr
> > +      * takes the size of two stack slots; the first slot will be at
> > +      * spi and the second slot will be at spi - 1.
> > +      */
> > +     return spi - nr_slots + 1 >= 0 && spi < allocated_slots;
> > +}
> > +
> >  static struct bpf_func_state *func(struct bpf_verifier_env *env,
> >                                  const struct bpf_reg_state *reg)
> >  {
> > @@ -642,6 +667,191 @@ static void mark_verifier_state_scratched(struct bpf_verifier_env *env)
> >       env->scratched_stack_slots = ~0ULL;
> >  }
> >
> > +#define DYNPTR_TYPE_FLAG_MASK (DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_MALLOC)
> > +
> > +static int arg_to_dynptr_type(enum bpf_arg_type arg_type)
> > +{
> > +     switch (arg_type & DYNPTR_TYPE_FLAG_MASK) {
> > +     case DYNPTR_TYPE_LOCAL:
> > +             return BPF_DYNPTR_TYPE_LOCAL;
> > +     case DYNPTR_TYPE_MALLOC:
> > +             return BPF_DYNPTR_TYPE_MALLOC;
> > +     default:
> > +             return BPF_DYNPTR_TYPE_INVALID;
> > +     }
> > +}
> > +
> > +static inline bool dynptr_type_refcounted(enum bpf_dynptr_type type)
> > +{
> > +     return type == BPF_DYNPTR_TYPE_MALLOC;
> > +}
> > +
> > +static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> > +                                enum bpf_arg_type arg_type)
> > +{
> > +     struct bpf_func_state *state = cur_func(env);
> > +     enum bpf_dynptr_type type;
> > +     int spi, i;
> > +
> > +     spi = get_spi(reg->off);
> > +
> > +     if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS))
> > +             return -EINVAL;
> > +
> > +     type = arg_to_dynptr_type(arg_type);
> > +     if (type == BPF_DYNPTR_TYPE_INVALID)
> > +             return -EINVAL;
> > +
> > +     for (i = 0; i < BPF_REG_SIZE; i++) {
> > +             state->stack[spi].slot_type[i] = STACK_DYNPTR;
> > +             state->stack[spi - 1].slot_type[i] = STACK_DYNPTR;
> > +     }
> > +
> > +     state->stack[spi].spilled_ptr.dynptr.type = type;
> > +     state->stack[spi - 1].spilled_ptr.dynptr.type = type;
> > +
> > +     state->stack[spi].spilled_ptr.dynptr.first_slot = true;
> > +
> > +     return 0;
> > +}
> > +
> > +static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> > +{
> > +     struct bpf_func_state *state = func(env, reg);
> > +     int spi, i;
> > +
> > +     spi = get_spi(reg->off);
> > +
> > +     if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS))
> > +             return -EINVAL;
> > +
> > +     for (i = 0; i < BPF_REG_SIZE; i++) {
> > +             state->stack[spi].slot_type[i] = STACK_INVALID;
> > +             state->stack[spi - 1].slot_type[i] = STACK_INVALID;
> > +     }
> > +
> > +     state->stack[spi].spilled_ptr.dynptr.type = 0;
> > +     state->stack[spi - 1].spilled_ptr.dynptr.type = 0;
> > +
> > +     state->stack[spi].spilled_ptr.dynptr.first_slot = 0;
> > +
> > +     return 0;
> > +}
> > +
> > +static int mark_as_dynptr_data(struct bpf_verifier_env *env, const struct bpf_func_proto *fn,
> > +                            struct bpf_reg_state *regs)
> > +{
> > +     struct bpf_func_state *state = cur_func(env);
> > +     struct bpf_reg_state *reg, *mem_reg = NULL;
> > +     enum bpf_arg_type arg_type;
> > +     u64 mem_size;
> > +     u32 nr_slots;
> > +     int i, spi;
> > +
> > +     /* We must protect against the case where a program tries to do something
> > +      * like this:
> > +      *
> > +      * bpf_dynptr_from_mem(&ptr, sizeof(ptr), 0, &local);
> > +      * bpf_dynptr_alloc(16, 0, &ptr);
> > +      * bpf_dynptr_write(&local, 0, corrupt_data, sizeof(ptr));
> > +      *
> > +      * If ptr is a variable on the stack, we must mark the stack slot as
> > +      * dynptr data when a local dynptr to it is created.
> > +      */
> > +     for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
> > +             arg_type = fn->arg_type[i];
> > +             reg = &regs[BPF_REG_1 + i];
> > +
> > +             if (base_type(arg_type) == ARG_PTR_TO_MEM) {
> > +                     if (base_type(reg->type) == PTR_TO_STACK) {
> > +                             mem_reg = reg;
> > +                             continue;
> > +                     }
> > +                     /* if it's not a PTR_TO_STACK, then we don't need to
> > +                      * mark anything since it can never be used as a dynptr.
> > +                      * We can just return here since there will always be
> > +                      * only one ARG_PTR_TO_MEM in fn.
> > +                      */
> > +                     return 0;
> > +             } else if (arg_type_is_mem_size(arg_type)) {
> > +                     mem_size = roundup(reg->var_off.value, BPF_REG_SIZE);
> > +             }
> > +     }
> > +
> > +     if (!mem_reg || !mem_size) {
> > +             verbose(env, "verifier internal error: invalid ARG_PTR_TO_MEM args for %s\n", __func__);
> > +             return -EFAULT;
> > +     }
> > +
> > +     spi = get_spi(mem_reg->off);
> > +     if (!is_spi_bounds_valid(state, spi, mem_size)) {
> > +             verbose(env, "verifier internal error: variable not initialized on stack in %s\n", __func__);
> > +             return -EFAULT;
> > +     }
> > +
> > +     nr_slots = mem_size / BPF_REG_SIZE;
> > +     for (i = 0; i < nr_slots; i++)
> > +             state->stack[spi - i].spilled_ptr.is_dynptr_data = true;
> > +
> > +     return 0;
> > +}
> > +
> > +static bool is_dynptr_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> > +                                    bool *is_dynptr_data)
> > +{
> > +     struct bpf_func_state *state = func(env, reg);
> > +     int spi;
> > +
> > +     spi = get_spi(reg->off);
> > +
> > +     if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS))
> > +             return true;
> > +
> > +     if (state->stack[spi].slot_type[0] == STACK_DYNPTR ||
> > +         state->stack[spi - 1].slot_type[0] == STACK_DYNPTR)
> > +             return false;
> > +
> > +     if (state->stack[spi].spilled_ptr.is_dynptr_data ||
> > +         state->stack[spi - 1].spilled_ptr.is_dynptr_data) {
> > +             *is_dynptr_data = true;
> > +             return false;
> > +     }
> > +
> > +     return true;
> > +}
> > +
> > +static bool is_dynptr_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> > +                                  enum bpf_arg_type arg_type)
> > +{
> > +     struct bpf_func_state *state = func(env, reg);
> > +     int spi = get_spi(reg->off);
> > +
> > +     if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS) ||
> > +         state->stack[spi].slot_type[0] != STACK_DYNPTR ||
> > +         state->stack[spi - 1].slot_type[0] != STACK_DYNPTR ||
> > +         !state->stack[spi].spilled_ptr.dynptr.first_slot)
> > +             return false;
> > +
> > +     /* ARG_PTR_TO_DYNPTR takes any type of dynptr */
> > +     if (arg_type == ARG_PTR_TO_DYNPTR)
> > +             return true;
> > +
> > +     return state->stack[spi].spilled_ptr.dynptr.type == arg_to_dynptr_type(arg_type);
> > +}
> > +
> > +static bool stack_access_into_dynptr(struct bpf_func_state *state, int spi, int size)
> > +{
> > +     int nr_slots = roundup(size, BPF_REG_SIZE) / BPF_REG_SIZE;
> > +     int i;
> > +
> > +     for (i = 0; i < nr_slots; i++) {
> > +             if (state->stack[spi - i].slot_type[0] == STACK_DYNPTR)
> > +                     return true;
> > +     }
> > +
> > +     return false;
> > +}
> > +
> >  /* The reg state of a pointer or a bounded scalar was saved when
> >   * it was spilled to the stack.
> >   */
> > @@ -2878,6 +3088,12 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
> >       }
> >
> >       mark_stack_slot_scratched(env, spi);
> > +
> > +     if (stack_access_into_dynptr(state, spi, size)) {
> > +             verbose(env, "direct write into dynptr is not permitted\n");
> > +             return -EINVAL;
> > +     }
> > +
> >       if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) &&
> >           !register_is_null(reg) && env->bpf_capable) {
> >               if (dst_reg != BPF_REG_FP) {
> > @@ -2999,6 +3215,12 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env,
> >               slot = -i - 1;
> >               spi = slot / BPF_REG_SIZE;
> >               stype = &state->stack[spi].slot_type[slot % BPF_REG_SIZE];
> > +
> > +             if (*stype == STACK_DYNPTR) {
> > +                     verbose(env, "direct write into dynptr is not permitted\n");
> > +                     return -EINVAL;
> > +             }
> > +
> >               mark_stack_slot_scratched(env, spi);
> >
> >               if (!env->allow_ptr_leaks
> > @@ -5141,6 +5363,16 @@ static bool arg_type_is_int_ptr(enum bpf_arg_type type)
> >              type == ARG_PTR_TO_LONG;
> >  }
> >
> > +static inline bool arg_type_is_dynptr(enum bpf_arg_type type)
> > +{
> > +     return base_type(type) == ARG_PTR_TO_DYNPTR;
> > +}
> > +
> > +static inline bool arg_type_is_dynptr_uninit(enum bpf_arg_type type)
> > +{
> > +     return arg_type_is_dynptr(type) && (type & MEM_UNINIT);
> > +}
> > +
> >  static int int_ptr_type_to_size(enum bpf_arg_type type)
> >  {
> >       if (type == ARG_PTR_TO_INT)
> > @@ -5278,6 +5510,7 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
> >       [ARG_PTR_TO_STACK]              = &stack_ptr_types,
> >       [ARG_PTR_TO_CONST_STR]          = &const_str_ptr_types,
> >       [ARG_PTR_TO_TIMER]              = &timer_types,
> > +     [ARG_PTR_TO_DYNPTR]             = &stack_ptr_types,
> >  };
> >
> >  static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
> > @@ -5450,10 +5683,16 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
> >               return err;
> >
> >  skip_type_check:
> > -     /* check_func_arg_reg_off relies on only one referenced register being
> > -      * allowed for BPF helpers.
> > -      */
> >       if (reg->ref_obj_id) {
> > +             if (arg_type & NO_OBJ_REF) {
> > +                     verbose(env, "Arg #%d cannot be a referenced object\n",
> > +                             arg + 1);
> > +                     return -EINVAL;
> > +             }
> > +
> > +             /* check_func_arg_reg_off relies on only one referenced register being
> > +              * allowed for BPF helpers.
> > +              */
> >               if (meta->ref_obj_id) {
> >                       verbose(env, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n",
> >                               regno, reg->ref_obj_id,
> > @@ -5463,16 +5702,26 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
> >               meta->ref_obj_id = reg->ref_obj_id;
> >       }
> >       if (arg_type & OBJ_RELEASE) {
> > -             if (!reg->ref_obj_id) {
> > +             if (arg_type_is_dynptr(arg_type)) {
> > +                     struct bpf_func_state *state = func(env, reg);
> > +                     int spi = get_spi(reg->off);
> > +
> > +                     if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS) ||
> > +                         !state->stack[spi].spilled_ptr.id) {
> > +                             verbose(env, "arg %d is an unacquired reference\n", regno);
> > +                             return -EINVAL;
> > +                     }
> > +                     meta->release_dynptr = true;
> > +             } else if (!reg->ref_obj_id) {
> >                       verbose(env, "arg %d is an unacquired reference\n", regno);
> >                       return -EINVAL;
> >               }
> > -             if (meta->release_ref) {
> > -                     verbose(env, "verifier internal error: more than one release_ref arg R%d\n",
> > -                             regno);
> > +             if (meta->release_regno) {
> > +                     verbose(env, "verifier internal error: more than one release_regno %u %u\n",
> > +                             meta->release_regno, regno);
> >                       return -EFAULT;
> >               }
> > -             meta->release_ref = true;
> > +             meta->release_regno = regno;
> >       }
> >
> >       if (arg_type == ARG_CONST_MAP_PTR) {
> > @@ -5565,6 +5814,44 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
> >               bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
> >
> >               err = check_mem_size_reg(env, reg, regno, zero_size_allowed, meta);
> > +     } else if (arg_type_is_dynptr(arg_type)) {
> > +             /* Can't pass in a dynptr at a weird offset */
> > +             if (reg->off % BPF_REG_SIZE) {
>
> In invalid_helper2 test, you are passing &dynptr + 8, which means reg will be
> fp-8 (assuming dynptr is at top of stack), get_spi will compute spi as 0, so
> spi-1 will lead to OOB access for the second dynptr stack slot. If you run the
> dynptr test under KASAN, you should see a warning for this.
>
> So we should ensure here that reg->off is atleast -16.
I think this is already checked against in is_spi_bounds(), where we
explicitly check that spi - 1 and spi is between [0, the allocated
stack). is_spi_bounds() gets called in "is_dynptr_reg_valid_init()" a
few lines down where we check if the initialized dynptr arg that's
passed in by the program is valid.

On my local environment, I simulated this "reg->off = -8" case and
this fails the is_dynptr_reg_valid_init() -> is_spi_bounds() check and
we get back the correct verifier error "Expected an initialized dynptr
as arg #3" without any OOB accesses. I also tried running it with
CONFIG_KASAN=y as well and didn't see any warnings show up. But maybe
I'm missing something in this analysis - what are your thoughts?
>
> > +                     verbose(env, "cannot pass in non-zero dynptr offset\n");
> > +                     return -EINVAL;
> > +             }
> > +
> > +             if (arg_type & MEM_UNINIT)  {
> > +                     bool is_dynptr_data = false;
> > +
> > +                     if (!is_dynptr_reg_valid_uninit(env, reg, &is_dynptr_data)) {
> > +                             if (is_dynptr_data)
> > +                                     verbose(env, "Arg #%d cannot be a memory reference for another dynptr\n",
> > +                                             arg + 1);
> > +                             else
> > +                                     verbose(env, "Arg #%d dynptr has to be an uninitialized dynptr\n",
> > +                                             arg + 1);
> > +                             return -EINVAL;
> > +                     }
> > +
> > +                     meta->uninit_dynptr_regno = arg + BPF_REG_1;
> > +             } else if (!is_dynptr_reg_valid_init(env, reg, arg_type)) {
> > +                     const char *err_extra = "";
> > +
> > +                     switch (arg_type & DYNPTR_TYPE_FLAG_MASK) {
> > +                     case DYNPTR_TYPE_LOCAL:
> > +                             err_extra = "local ";
> > +                             break;
> > +                     case DYNPTR_TYPE_MALLOC:
> > +                             err_extra = "malloc ";
> > +                             break;
> > +                     default:
> > +                             break;
> > +                     }
> > +                     verbose(env, "Expected an initialized %sdynptr as arg #%d\n",
> > +                             err_extra, arg + 1);
> > +                     return -EINVAL;
> > +             }
> >       } else if (arg_type_is_alloc_size(arg_type)) {
> >               if (!tnum_is_const(reg->var_off)) {
> >                       verbose(env, "R%d is not a known constant'\n",
> > @@ -6545,6 +6832,28 @@ static int check_reference_leak(struct bpf_verifier_env *env)
> >       return state->acquired_refs ? -EINVAL : 0;
> >  }
> >
> > +/* Called at BPF_EXIT to detect if there are any reference-tracked dynptrs that have
> > + * not been released. Dynptrs to local memory do not need to be released.
> > + */
> > +static int check_dynptr_unreleased(struct bpf_verifier_env *env)
> > +{
> > +     struct bpf_func_state *state = cur_func(env);
> > +     int allocated_slots, i;
> > +
> > +     allocated_slots = state->allocated_stack / BPF_REG_SIZE;
> > +
> > +     for (i = 0; i < allocated_slots; i++) {
> > +             if (state->stack[i].slot_type[0] == STACK_DYNPTR) {
> > +                     if (dynptr_type_refcounted(state->stack[i].spilled_ptr.dynptr.type)) {
> > +                             verbose(env, "spi=%d is an unreleased dynptr\n", i);
> > +                             return -EINVAL;
> > +                     }
> > +             }
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> >  static int check_bpf_snprintf_call(struct bpf_verifier_env *env,
> >                                  struct bpf_reg_state *regs)
> >  {
> > @@ -6686,8 +6995,38 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
> >                       return err;
> >       }
> >
> > -     if (meta.release_ref) {
> > -             err = release_reference(env, meta.ref_obj_id);
> > +     regs = cur_regs(env);
> > +
> > +     if (meta.uninit_dynptr_regno) {
> > +             enum bpf_arg_type type;
> > +
> > +             /* we write BPF_W bits (4 bytes) at a time */
> > +             for (i = 0; i < BPF_DYNPTR_SIZE; i += 4) {
> > +                     err = check_mem_access(env, insn_idx, meta.uninit_dynptr_regno,
> > +                                            i, BPF_W, BPF_WRITE, -1, false);
> > +                     if (err)
> > +                             return err;
> > +             }
> > +
> > +             type = fn->arg_type[meta.uninit_dynptr_regno - BPF_REG_1];
> > +
> > +             err = mark_stack_slots_dynptr(env, &regs[meta.uninit_dynptr_regno], type);
> > +             if (err)
> > +                     return err;
> > +
> > +             if (type & DYNPTR_TYPE_LOCAL) {
> > +                     err = mark_as_dynptr_data(env, fn, regs);
> > +                     if (err)
> > +                             return err;
> > +             }
> > +     }
> > +
> > +     if (meta.release_regno) {
> > +             if (meta.release_dynptr) {
> > +                     err = unmark_stack_slots_dynptr(env, &regs[meta.release_regno]);
> > +             } else {
> > +                     err = release_reference(env, meta.ref_obj_id);
> > +             }
> >               if (err) {
> >                       verbose(env, "func %s#%d reference has not been acquired before\n",
> >                               func_id_name(func_id), func_id);
> > @@ -6695,8 +7034,6 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
> >               }
> >       }
> >
> > -     regs = cur_regs(env);
> > -
> >       switch (func_id) {
> >       case BPF_FUNC_tail_call:
> >               err = check_reference_leak(env);
> > @@ -6704,6 +7041,11 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
> >                       verbose(env, "tail_call would lead to reference leak\n");
> >                       return err;
> >               }
> > +             err = check_dynptr_unreleased(env);
> > +             if (err) {
> > +                     verbose(env, "tail_call would lead to dynptr memory leak\n");
> > +                     return err;
> > +             }
> >               break;
> >       case BPF_FUNC_get_local_storage:
> >               /* check that flags argument in get_local_storage(map, flags) is 0,
> > @@ -11696,6 +12038,10 @@ static int do_check(struct bpf_verifier_env *env)
> >                                       return -EINVAL;
> >                               }
> >
> > +                             err = check_dynptr_unreleased(env);
> > +                             if (err)
> > +                                     return err;
> > +
> >                               if (state->curframe) {
> >                                       /* exit from nested function */
> >                                       err = prepare_func_exit(env, &env->insn_idx);
[...]
> > --
> > 2.30.2
> >
>
> --
> Kartikeya

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v2 3/7] bpf: Add bpf_dynptr_from_mem, bpf_dynptr_alloc, bpf_dynptr_put
  2022-04-18 22:20     ` Joanne Koong
@ 2022-04-18 23:57       ` Kumar Kartikeya Dwivedi
  2022-04-19 19:23         ` Joanne Koong
  0 siblings, 1 reply; 27+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-04-18 23:57 UTC (permalink / raw)
  To: Joanne Koong
  Cc: bpf, Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Tue, Apr 19, 2022 at 03:50:38AM IST, Joanne Koong wrote:
> ()On Sat, Apr 16, 2022 at 10:42 AM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> > [...]
> > >
> > >       if (arg_type == ARG_CONST_MAP_PTR) {
> > > @@ -5565,6 +5814,44 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
> > >               bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
> > >
> > >               err = check_mem_size_reg(env, reg, regno, zero_size_allowed, meta);
> > > +     } else if (arg_type_is_dynptr(arg_type)) {
> > > +             /* Can't pass in a dynptr at a weird offset */
> > > +             if (reg->off % BPF_REG_SIZE) {
> >
> > In invalid_helper2 test, you are passing &dynptr + 8, which means reg will be
> > fp-8 (assuming dynptr is at top of stack), get_spi will compute spi as 0, so
> > spi-1 will lead to OOB access for the second dynptr stack slot. If you run the
> > dynptr test under KASAN, you should see a warning for this.
> >
> > So we should ensure here that reg->off is atleast -16.
> I think this is already checked against in is_spi_bounds(), where we
> explicitly check that spi - 1 and spi is between [0, the allocated
> stack). is_spi_bounds() gets called in "is_dynptr_reg_valid_init()" a
> few lines down where we check if the initialized dynptr arg that's
> passed in by the program is valid.
>
> On my local environment, I simulated this "reg->off = -8" case and
> this fails the is_dynptr_reg_valid_init() -> is_spi_bounds() check and
> we get back the correct verifier error "Expected an initialized dynptr
> as arg #3" without any OOB accesses. I also tried running it with
> CONFIG_KASAN=y as well and didn't see any warnings show up. But maybe
> I'm missing something in this analysis - what are your thoughts?

I should have been clearer, the report is for accessing state->stack[spi -
1].slot_type[0] in is_dynptr_reg_valid_init, when the program is being loaded.

I can understand why you might not see the warning. It is accessing
state->stack[spi - 1], and the allocation comes from kmalloc slab cache, so if
another allocation has an object that covers the region being touched, KASAN
probably won't complain, and you won't see the warning.

Getting back the correct result for the test can also happen if you don't load
STACK_DYNPTR at the state->stack[spi - 1].slot_type[0] byte. The test is passing
for me too, fwiw.

Anyway, digging into this reveals the real problem.

>>> static bool is_dynptr_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
>>> 				     enum bpf_arg_type arg_type)
>>> {
>>> 	struct bpf_func_state *state = func(env, reg);
>>> 	int spi = get_spi(reg->off);
>>>

Here, for reg->off = -8, get_spi is (-(-8) - 1)/BPF_REG_SIZE = 0.

>>> 	if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS) ||

is_spi_bounds_valid will return true, probably because of the unintended
conversion of the expression (spi - nr_slots + 1) to unsigned, so the test
against >= 0 is always true (compiler will optimize it out), and just test
whether spi < allocated_stacks.

You should probably declare nr_slots as int, instead of u32. Just doing this
should be enough to prevent this, without ensuring reg->off is <= -16.

>>> 	    state->stack[spi].slot_type[0] != STACK_DYNPTR ||

Execution moves on to this, which is (second dynptr slot is STACK_DYNPTR).

>>> 	    state->stack[spi - 1].slot_type[0] != STACK_DYNPTR ||

and it accesses state->stack[-1].slot_type[0] here, which triggers the KASAN
warning for me.

>>> 	    !state->stack[spi].spilled_ptr.dynptr.first_slot)
>>> 		return false;
>>>
>>> 	/* ARG_PTR_TO_DYNPTR takes any type of dynptr */
>>> 	if (arg_type == ARG_PTR_TO_DYNPTR)
>>> 		return true;
>>>
>>> 	return state->stack[spi].spilled_ptr.dynptr.type == arg_to_dynptr_type(arg_type);
>>> }

> > [...]

There is another issue I noticed while basing other work on this. You have
declared bpf_dynptr in UAPI header as:

	struct bpf_dynptr {
		__u64 :64;
		__u64 :64;
	} __attribute__((aligned(8)));

Sadly, in C standard, the compiler is under no obligation to initialize padding
bits when the object is zero initialized (using = {}). It is worse, when
unrelated struct fields are assigned the padding bits are assumed to attain
unspecified values, but compilers are usually conservative in that case (C11
6.2.6.1 p6).

See 5eaed6eedbe9 ("bpf: Fix a bpf_timer initialization issue") on how this has
bitten us once before.

I was kinda surprised you don't hit this with your selftests, since in the BPF
assembly of dynptr_fail.o/dynptr_success.o I seldom see stack location of dynptr
being zeroed out. But after applying the fix for the above issue, I see this
error and many failing tests (only 26/36 passed).

verifier internal error: variable not initialized on stack in mark_as_dynptr_data

So I think the bug above was papering over this issue? I will look at it in more
detail later.

--
Kartikeya

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v2 1/7] bpf: Add MEM_UNINIT as a bpf_type_flag
  2022-04-16  6:34 ` [PATCH bpf-next v2 1/7] bpf: Add MEM_UNINIT as a bpf_type_flag Joanne Koong
@ 2022-04-19  4:59   ` Alexei Starovoitov
  2022-04-19 19:26     ` Joanne Koong
  0 siblings, 1 reply; 27+ messages in thread
From: Alexei Starovoitov @ 2022-04-19  4:59 UTC (permalink / raw)
  To: Joanne Koong; +Cc: bpf, andrii, memxor, ast, daniel, toke

On Fri, Apr 15, 2022 at 11:34:23PM -0700, Joanne Koong wrote:
> -	ARG_PTR_TO_UNINIT_MEM,	/* pointer to memory does not need to be initialized,
> +	/* pointer to memory does not need to be initialized, helper function must fill
> +	 * all bytes or clear them in error case.
> +	 */
> +	ARG_PTR_TO_MEM_UNINIT		= MEM_UNINIT | ARG_PTR_TO_MEM,

Could you keep the name as ARG_PTR_TO_UNINIT_MEM ?
This will avoid churn in all the lines below.

> -	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
> +	.arg2_type	= ARG_PTR_TO_MEM_UNINIT,
...
> -	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
> +	.arg2_type	= ARG_PTR_TO_MEM_UNINIT,
...
> -	if (fn->arg1_type == ARG_PTR_TO_UNINIT_MEM)
> +	if (fn->arg1_type == ARG_PTR_TO_MEM_UNINIT)
>  		count++;
> -	if (fn->arg2_type == ARG_PTR_TO_UNINIT_MEM)
> +	if (fn->arg2_type == ARG_PTR_TO_MEM_UNINIT)
>  		count++;
> -	if (fn->arg3_type == ARG_PTR_TO_UNINIT_MEM)
> +	if (fn->arg3_type == ARG_PTR_TO_MEM_UNINIT)
>  		count++;
> -	if (fn->arg4_type == ARG_PTR_TO_UNINIT_MEM)
> +	if (fn->arg4_type == ARG_PTR_TO_MEM_UNINIT)
>  		count++;
> -	if (fn->arg5_type == ARG_PTR_TO_UNINIT_MEM)
> +	if (fn->arg5_type == ARG_PTR_TO_MEM_UNINIT)
>  		count++;
etc.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v2 3/7] bpf: Add bpf_dynptr_from_mem, bpf_dynptr_alloc, bpf_dynptr_put
  2022-04-18 23:57       ` Kumar Kartikeya Dwivedi
@ 2022-04-19 19:23         ` Joanne Koong
  2022-04-19 20:18           ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 27+ messages in thread
From: Joanne Koong @ 2022-04-19 19:23 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Mon, Apr 18, 2022 at 4:57 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Tue, Apr 19, 2022 at 03:50:38AM IST, Joanne Koong wrote:
> > ()On Sat, Apr 16, 2022 at 10:42 AM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> > > [...]
> > > >
> > > >       if (arg_type == ARG_CONST_MAP_PTR) {
> > > > @@ -5565,6 +5814,44 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
> > > >               bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
> > > >
> > > >               err = check_mem_size_reg(env, reg, regno, zero_size_allowed, meta);
> > > > +     } else if (arg_type_is_dynptr(arg_type)) {
> > > > +             /* Can't pass in a dynptr at a weird offset */
> > > > +             if (reg->off % BPF_REG_SIZE) {
> > >
> > > In invalid_helper2 test, you are passing &dynptr + 8, which means reg will be
> > > fp-8 (assuming dynptr is at top of stack), get_spi will compute spi as 0, so
> > > spi-1 will lead to OOB access for the second dynptr stack slot. If you run the
> > > dynptr test under KASAN, you should see a warning for this.
> > >
> > > So we should ensure here that reg->off is atleast -16.
> > I think this is already checked against in is_spi_bounds(), where we
> > explicitly check that spi - 1 and spi is between [0, the allocated
> > stack). is_spi_bounds() gets called in "is_dynptr_reg_valid_init()" a
> > few lines down where we check if the initialized dynptr arg that's
> > passed in by the program is valid.
> >
> > On my local environment, I simulated this "reg->off = -8" case and
> > this fails the is_dynptr_reg_valid_init() -> is_spi_bounds() check and
> > we get back the correct verifier error "Expected an initialized dynptr
> > as arg #3" without any OOB accesses. I also tried running it with
> > CONFIG_KASAN=y as well and didn't see any warnings show up. But maybe
> > I'm missing something in this analysis - what are your thoughts?
>
> I should have been clearer, the report is for accessing state->stack[spi -
> 1].slot_type[0] in is_dynptr_reg_valid_init, when the program is being loaded.
>
> I can understand why you might not see the warning. It is accessing
> state->stack[spi - 1], and the allocation comes from kmalloc slab cache, so if
> another allocation has an object that covers the region being touched, KASAN
> probably won't complain, and you won't see the warning.
>
> Getting back the correct result for the test can also happen if you don't load
> STACK_DYNPTR at the state->stack[spi - 1].slot_type[0] byte. The test is passing
> for me too, fwiw.
>
> Anyway, digging into this reveals the real problem.
>
> >>> static bool is_dynptr_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> >>>                                  enum bpf_arg_type arg_type)
> >>> {
> >>>     struct bpf_func_state *state = func(env, reg);
> >>>     int spi = get_spi(reg->off);
> >>>
>
> Here, for reg->off = -8, get_spi is (-(-8) - 1)/BPF_REG_SIZE = 0.
" >
> >>>     if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS) ||
>
> is_spi_bounds_valid will return true, probably because of the unintended
> conversion of the expression (spi - nr_slots + 1) to unsigned, so the test
Oh interesting. I missed that arithmetic on int and unsigned always
casts the result to unsigned if both have the same conversion rank.
Thanks for pointing this out. I'll change nr_slots to int.

> against >= 0 is always true (compiler will optimize it out), and just test
> whether spi < allocated_stacks.
>
> You should probably declare nr_slots as int, instead of u32. Just doing this
> should be enough to prevent this, without ensuring reg->off is <= -16.
>
> >>>         state->stack[spi].slot_type[0] != STACK_DYNPTR ||
>
> Execution moves on to this, which is (second dynptr slot is STACK_DYNPTR).
>
> >>>         state->stack[spi - 1].slot_type[0] != STACK_DYNPTR ||
>
> and it accesses state->stack[-1].slot_type[0] here, which triggers the KASAN
> warning for me.
>
> >>>         !state->stack[spi].spilled_ptr.dynptr.first_slot)
> >>>             return false;
> >>>
> >>>     /* ARG_PTR_TO_DYNPTR takes any type of dynptr */
> >>>     if (arg_type == ARG_PTR_TO_DYNPTR)
> >>>             return true;
> >>>
> >>>     return state->stack[spi].spilled_ptr.dynptr.type == arg_to_dynptr_type(arg_type);
> >>> }
>
> > > [...]
>
> There is another issue I noticed while basing other work on this. You have
> declared bpf_dynptr in UAPI header as:
>
>         struct bpf_dynptr {
>                 __u64 :64;
>                 __u64 :64;
>         } __attribute__((aligned(8)));
>
> Sadly, in C standard, the compiler is under no obligation to initialize padding
> bits when the object is zero initialized (using = {}). It is worse, when
> unrelated struct fields are assigned the padding bits are assumed to attain
> unspecified values, but compilers are usually conservative in that case (C11
> 6.2.6.1 p6).
Thanks for noting this. By "padding bits", you are referring to the
unnamed fields, correct?

From the commit message in 5eaed6eedbe9, I see:

INTERNATIONAL STANDARD ©ISO/IEC ISO/IEC 9899:201x
  Programming languages — C
  http://www.open-std.org/Jtc1/sc22/wg14/www/docs/n1547.pdf
  page 157:
  Except where explicitly stated otherwise, for the purposes of
  this subclause unnamed members of objects of structure and union
  type do not participate in initialization. Unnamed members of
  structure objects have indeterminate value even after initialization.

so it seems like the best way to address that here is to just have the
fields be explicitly named, like something like

struct bpf_dynptr {
    __u64 anon1;
    __u64 anon2;
} __attribute__((aligned(8)))

Do you agree with this assessment?

>
> See 5eaed6eedbe9 ("bpf: Fix a bpf_timer initialization issue") on how this has
> bitten us once before.
>
> I was kinda surprised you don't hit this with your selftests, since in the BPF
> assembly of dynptr_fail.o/dynptr_success.o I seldom see stack location of dynptr
> being zeroed out. But after applying the fix for the above issue, I see this
> error and many failing tests (only 26/36 passed).
>
> verifier internal error: variable not initialized on stack in mark_as_dynptr_data
>
> So I think the bug above was papering over this issue? I will look at it in more
> detail later.
>
> --
> Kartikeya

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v2 1/7] bpf: Add MEM_UNINIT as a bpf_type_flag
  2022-04-19  4:59   ` Alexei Starovoitov
@ 2022-04-19 19:26     ` Joanne Koong
  0 siblings, 0 replies; 27+ messages in thread
From: Joanne Koong @ 2022-04-19 19:26 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Mon, Apr 18, 2022 at 9:59 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Apr 15, 2022 at 11:34:23PM -0700, Joanne Koong wrote:
> > -     ARG_PTR_TO_UNINIT_MEM,  /* pointer to memory does not need to be initialized,
> > +     /* pointer to memory does not need to be initialized, helper function must fill
> > +      * all bytes or clear them in error case.
> > +      */
> > +     ARG_PTR_TO_MEM_UNINIT           = MEM_UNINIT | ARG_PTR_TO_MEM,
>
> Could you keep the name as ARG_PTR_TO_UNINIT_MEM ?
> This will avoid churn in all the lines below.
>
> > -     .arg2_type      = ARG_PTR_TO_UNINIT_MEM,
> > +     .arg2_type      = ARG_PTR_TO_MEM_UNINIT,
> ...
> > -     .arg2_type      = ARG_PTR_TO_UNINIT_MEM,
> > +     .arg2_type      = ARG_PTR_TO_MEM_UNINIT,
> ...
> > -     if (fn->arg1_type == ARG_PTR_TO_UNINIT_MEM)
> > +     if (fn->arg1_type == ARG_PTR_TO_MEM_UNINIT)
> >               count++;
> > -     if (fn->arg2_type == ARG_PTR_TO_UNINIT_MEM)
> > +     if (fn->arg2_type == ARG_PTR_TO_MEM_UNINIT)
> >               count++;
> > -     if (fn->arg3_type == ARG_PTR_TO_UNINIT_MEM)
> > +     if (fn->arg3_type == ARG_PTR_TO_MEM_UNINIT)
> >               count++;
> > -     if (fn->arg4_type == ARG_PTR_TO_UNINIT_MEM)
> > +     if (fn->arg4_type == ARG_PTR_TO_MEM_UNINIT)
> >               count++;
> > -     if (fn->arg5_type == ARG_PTR_TO_UNINIT_MEM)
> > +     if (fn->arg5_type == ARG_PTR_TO_MEM_UNINIT)
> >               count++;
> etc.
Will do - I'll make this edit for v3.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v2 3/7] bpf: Add bpf_dynptr_from_mem, bpf_dynptr_alloc, bpf_dynptr_put
  2022-04-19 19:23         ` Joanne Koong
@ 2022-04-19 20:18           ` Kumar Kartikeya Dwivedi
  2022-04-20 21:15             ` Joanne Koong
  0 siblings, 1 reply; 27+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-04-19 20:18 UTC (permalink / raw)
  To: Joanne Koong
  Cc: bpf, Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Wed, Apr 20, 2022 at 12:53:55AM IST, Joanne Koong wrote:
> > [...]
> > There is another issue I noticed while basing other work on this. You have
> > declared bpf_dynptr in UAPI header as:
> >
> >         struct bpf_dynptr {
> >                 __u64 :64;
> >                 __u64 :64;
> >         } __attribute__((aligned(8)));
> >
> > Sadly, in C standard, the compiler is under no obligation to initialize padding
> > bits when the object is zero initialized (using = {}). It is worse, when
> > unrelated struct fields are assigned the padding bits are assumed to attain
> > unspecified values, but compilers are usually conservative in that case (C11
> > 6.2.6.1 p6).
> Thanks for noting this. By "padding bits", you are referring to the
> unnamed fields, correct?
>
> From the commit message in 5eaed6eedbe9, I see:
>
> INTERNATIONAL STANDARD ©ISO/IEC ISO/IEC 9899:201x
>   Programming languages — C
>   http://www.open-std.org/Jtc1/sc22/wg14/www/docs/n1547.pdf
>   page 157:
>   Except where explicitly stated otherwise, for the purposes of
>   this subclause unnamed members of objects of structure and union
>   type do not participate in initialization. Unnamed members of
>   structure objects have indeterminate value even after initialization.
>
> so it seems like the best way to address that here is to just have the
> fields be explicitly named, like something like
>
> struct bpf_dynptr {
>     __u64 anon1;
>     __u64 anon2;
> } __attribute__((aligned(8)))
>
> Do you agree with this assessment?
>

Yes, this should work. Also, maybe 'variable not initialized error' shouldn't be
'verifier internal error', since it would quite common for user to hit it.

--
Kartikeya

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v2 3/7] bpf: Add bpf_dynptr_from_mem, bpf_dynptr_alloc, bpf_dynptr_put
  2022-04-16  6:34 ` [PATCH bpf-next v2 3/7] bpf: Add bpf_dynptr_from_mem, bpf_dynptr_alloc, bpf_dynptr_put Joanne Koong
  2022-04-16 17:42   ` Kumar Kartikeya Dwivedi
@ 2022-04-19 20:35   ` Kumar Kartikeya Dwivedi
  2022-04-22  2:52   ` Alexei Starovoitov
  2 siblings, 0 replies; 27+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-04-19 20:35 UTC (permalink / raw)
  To: Joanne Koong; +Cc: bpf, andrii, ast, daniel, toke

On Sat, Apr 16, 2022 at 12:04:25PM IST, Joanne Koong wrote:
> This patch adds 3 new APIs and the bulk of the verifier work for
> supporting dynamic pointers in bpf.
>
> There are different types of dynptrs. This patch starts with the most
> basic ones, ones that reference a program's local memory
> (eg a stack variable) and ones that reference memory that is dynamically
> allocated on behalf of the program. If the memory is dynamically
> allocated by the program, the program *must* free it before the program
> exits. This is enforced by the verifier.
>
> The added APIs are:
>
> long bpf_dynptr_from_mem(void *data, u32 size, u64 flags, struct bpf_dynptr *ptr);
> long bpf_dynptr_alloc(u32 size, u64 flags, struct bpf_dynptr *ptr);
> void bpf_dynptr_put(struct bpf_dynptr *ptr);
>
> This patch sets up the verifier to support dynptrs. Dynptrs will always
> reside on the program's stack frame. As such, their state is tracked
> in their corresponding stack slot, which includes the type of dynptr
> (DYNPTR_LOCAL vs. DYNPTR_MALLOC).
>
> When the program passes in an uninitialized dynptr (ARG_PTR_TO_DYNPTR |
> MEM_UNINIT), the stack slots corresponding to the frame pointer
> where the dynptr resides at are marked as STACK_DYNPTR. For helper functions
> that take in initialized dynptrs (such as the next patch in this series
> which supports dynptr reads/writes), the verifier enforces that the
> dynptr has been initialized by checking that their corresponding stack
> slots have been marked as STACK_DYNPTR. Dynptr release functions
> (eg bpf_dynptr_put) will clear the stack slots. The verifier enforces at
> program exit that there are no acquired dynptr stack slots that need
> to be released.
>
> There are other constraints that are enforced by the verifier as
> well, such as that the dynptr cannot be written to directly by the bpf
> program or by non-dynptr helper functions. The last patch in this series
> contains tests that trigger different cases that the verifier needs to
> successfully reject.
>
> For now, local dynptrs cannot point to referenced memory since the
> memory can be freed anytime. Support for this will be added as part
> of a separate patchset.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>  include/linux/bpf.h            |  68 +++++-
>  include/linux/bpf_verifier.h   |  28 +++
>  include/uapi/linux/bpf.h       |  44 ++++
>  kernel/bpf/helpers.c           | 110 ++++++++++
>  kernel/bpf/verifier.c          | 372 +++++++++++++++++++++++++++++++--
>  scripts/bpf_doc.py             |   2 +
>  tools/include/uapi/linux/bpf.h |  44 ++++
>  7 files changed, 654 insertions(+), 14 deletions(-)
>
> [...]
> +/* Called at BPF_EXIT to detect if there are any reference-tracked dynptrs that have
> + * not been released. Dynptrs to local memory do not need to be released.
> + */
> +static int check_dynptr_unreleased(struct bpf_verifier_env *env)
> +{
> +	struct bpf_func_state *state = cur_func(env);
> +	int allocated_slots, i;
> +
> +	allocated_slots = state->allocated_stack / BPF_REG_SIZE;
> +
> +	for (i = 0; i < allocated_slots; i++) {
> +		if (state->stack[i].slot_type[0] == STACK_DYNPTR) {
> +			if (dynptr_type_refcounted(state->stack[i].spilled_ptr.dynptr.type)) {
> +				verbose(env, "spi=%d is an unreleased dynptr\n", i);
> +				return -EINVAL;
> +			}
> +		}
> +	}
> +
> +	return 0;
> +}

We need to call this function in check_ld_abs as well.

> [...]

--
Kartikeya

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v2 3/7] bpf: Add bpf_dynptr_from_mem, bpf_dynptr_alloc, bpf_dynptr_put
  2022-04-19 20:18           ` Kumar Kartikeya Dwivedi
@ 2022-04-20 21:15             ` Joanne Koong
  0 siblings, 0 replies; 27+ messages in thread
From: Joanne Koong @ 2022-04-20 21:15 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Tue, Apr 19, 2022 at 1:18 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Wed, Apr 20, 2022 at 12:53:55AM IST, Joanne Koong wrote:
> > > [...]
> > > There is another issue I noticed while basing other work on this. You have
> > > declared bpf_dynptr in UAPI header as:
> > >
> > >         struct bpf_dynptr {
> > >                 __u64 :64;
> > >                 __u64 :64;
> > >         } __attribute__((aligned(8)));
> > >
> > > Sadly, in C standard, the compiler is under no obligation to initialize padding
> > > bits when the object is zero initialized (using = {}). It is worse, when
> > > unrelated struct fields are assigned the padding bits are assumed to attain
> > > unspecified values, but compilers are usually conservative in that case (C11
> > > 6.2.6.1 p6).
> > Thanks for noting this. By "padding bits", you are referring to the
> > unnamed fields, correct?
> >
> > From the commit message in 5eaed6eedbe9, I see:
> >
> > INTERNATIONAL STANDARD ©ISO/IEC ISO/IEC 9899:201x
> >   Programming languages — C
> >   http://www.open-std.org/Jtc1/sc22/wg14/www/docs/n1547.pdf
> >   page 157:
> >   Except where explicitly stated otherwise, for the purposes of
> >   this subclause unnamed members of objects of structure and union
> >   type do not participate in initialization. Unnamed members of
> >   structure objects have indeterminate value even after initialization.
> >
> > so it seems like the best way to address that here is to just have the
> > fields be explicitly named, like something like
> >
> > struct bpf_dynptr {
> >     __u64 anon1;
> >     __u64 anon2;
> > } __attribute__((aligned(8)))
> >
> > Do you agree with this assessment?
> >
>
> Yes, this should work. Also, maybe 'variable not initialized error' shouldn't be
> 'verifier internal error', since it would quite common for user to hit it.
>
I looked into this some more and I don't think it's an issue that the
compiler doesn't initialize anonymous fields and/or initializes it
with indeterminate values. We set up the dynptr in
bpf_dynptr_from_mem() and bpf_dynptr_alloc() where we initialize its
contents with real values. It doesn't matter if prior to
bpf_dynptr_from_mem()/bpf_dynptr_alloc() it's filled with garbage
values because they'll be overridden.

The "verifier internal error: variable not initialized on stack in
mark_as_dynptr_data" error you were seeing is unrelated to this. It's
because of a mistake in mark_as_dynptr_data() where when we check that
the memory size of the data should be within the spi bounds, the 3rd
argument we pass to is_spi_bounds_valid() should be the number of
slots, not the memory size (the value should be mem_size /
BPF_REG_SIZE, not mem_size). Changing this fixes the error.

> --
> Kartikeya

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v2 3/7] bpf: Add bpf_dynptr_from_mem, bpf_dynptr_alloc, bpf_dynptr_put
  2022-04-16  6:34 ` [PATCH bpf-next v2 3/7] bpf: Add bpf_dynptr_from_mem, bpf_dynptr_alloc, bpf_dynptr_put Joanne Koong
  2022-04-16 17:42   ` Kumar Kartikeya Dwivedi
  2022-04-19 20:35   ` Kumar Kartikeya Dwivedi
@ 2022-04-22  2:52   ` Alexei Starovoitov
  2022-04-26 23:45     ` Joanne Koong
  2 siblings, 1 reply; 27+ messages in thread
From: Alexei Starovoitov @ 2022-04-22  2:52 UTC (permalink / raw)
  To: Joanne Koong; +Cc: bpf, andrii, memxor, ast, daniel, toke

On Fri, Apr 15, 2022 at 11:34:25PM -0700, Joanne Koong wrote:
> This patch adds 3 new APIs and the bulk of the verifier work for
> supporting dynamic pointers in bpf.
> 
> There are different types of dynptrs. This patch starts with the most
> basic ones, ones that reference a program's local memory
> (eg a stack variable) and ones that reference memory that is dynamically
> allocated on behalf of the program. If the memory is dynamically
> allocated by the program, the program *must* free it before the program
> exits. This is enforced by the verifier.
> 
> The added APIs are:
> 
> long bpf_dynptr_from_mem(void *data, u32 size, u64 flags, struct bpf_dynptr *ptr);
> long bpf_dynptr_alloc(u32 size, u64 flags, struct bpf_dynptr *ptr);
> void bpf_dynptr_put(struct bpf_dynptr *ptr);
> 
> This patch sets up the verifier to support dynptrs. Dynptrs will always
> reside on the program's stack frame. As such, their state is tracked
> in their corresponding stack slot, which includes the type of dynptr
> (DYNPTR_LOCAL vs. DYNPTR_MALLOC).
> 
> When the program passes in an uninitialized dynptr (ARG_PTR_TO_DYNPTR |
> MEM_UNINIT), the stack slots corresponding to the frame pointer
> where the dynptr resides at are marked as STACK_DYNPTR. For helper functions
> that take in initialized dynptrs (such as the next patch in this series
> which supports dynptr reads/writes), the verifier enforces that the
> dynptr has been initialized by checking that their corresponding stack
> slots have been marked as STACK_DYNPTR. Dynptr release functions
> (eg bpf_dynptr_put) will clear the stack slots. The verifier enforces at
> program exit that there are no acquired dynptr stack slots that need
> to be released.
> 
> There are other constraints that are enforced by the verifier as
> well, such as that the dynptr cannot be written to directly by the bpf
> program or by non-dynptr helper functions. The last patch in this series
> contains tests that trigger different cases that the verifier needs to
> successfully reject.
> 
> For now, local dynptrs cannot point to referenced memory since the
> memory can be freed anytime. Support for this will be added as part
> of a separate patchset.
> 
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>  include/linux/bpf.h            |  68 +++++-
>  include/linux/bpf_verifier.h   |  28 +++
>  include/uapi/linux/bpf.h       |  44 ++++
>  kernel/bpf/helpers.c           | 110 ++++++++++
>  kernel/bpf/verifier.c          | 372 +++++++++++++++++++++++++++++++--
>  scripts/bpf_doc.py             |   2 +
>  tools/include/uapi/linux/bpf.h |  44 ++++
>  7 files changed, 654 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 29964cdb1dd6..fee91b07ee74 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -346,7 +346,16 @@ enum bpf_type_flag {
>  
>  	OBJ_RELEASE		= BIT(6 + BPF_BASE_TYPE_BITS),
>  
> -	__BPF_TYPE_LAST_FLAG	= OBJ_RELEASE,
> +	/* DYNPTR points to a program's local memory (eg stack variable). */
> +	DYNPTR_TYPE_LOCAL	= BIT(7 + BPF_BASE_TYPE_BITS),
> +
> +	/* DYNPTR points to dynamically allocated memory. */
> +	DYNPTR_TYPE_MALLOC	= BIT(8 + BPF_BASE_TYPE_BITS),
> +
> +	/* May not be a referenced object */
> +	NO_OBJ_REF		= BIT(9 + BPF_BASE_TYPE_BITS),
> +
> +	__BPF_TYPE_LAST_FLAG	= NO_OBJ_REF,
>  };
>  
>  /* Max number of base types. */
> @@ -390,6 +399,7 @@ enum bpf_arg_type {
>  	ARG_PTR_TO_STACK,	/* pointer to stack */
>  	ARG_PTR_TO_CONST_STR,	/* pointer to a null terminated read-only string */
>  	ARG_PTR_TO_TIMER,	/* pointer to bpf_timer */
> +	ARG_PTR_TO_DYNPTR,      /* pointer to bpf_dynptr. See bpf_type_flag for dynptr type */
>  	__BPF_ARG_TYPE_MAX,
>  
>  	/* Extended arg_types. */
> @@ -2394,4 +2404,60 @@ int bpf_bprintf_prepare(char *fmt, u32 fmt_size, const u64 *raw_args,
>  			u32 **bin_buf, u32 num_args);
>  void bpf_bprintf_cleanup(void);
>  
> +/* the implementation of the opaque uapi struct bpf_dynptr */
> +struct bpf_dynptr_kern {
> +	void *data;
> +	/* Size represents the number of usable bytes in the dynptr.
> +	 * If for example the offset is at 200 for a malloc dynptr with
> +	 * allocation size 256, the number of usable bytes is 56.
> +	 *
> +	 * The upper 8 bits are reserved.
> +	 * Bit 31 denotes whether the dynptr is read-only.
> +	 * Bits 28-30 denote the dynptr type.
> +	 */
> +	u32 size;
> +	u32 offset;
> +} __aligned(8);
> +
> +enum bpf_dynptr_type {
> +	BPF_DYNPTR_TYPE_INVALID,
> +	/* Local memory used by the bpf program (eg stack variable) */
> +	BPF_DYNPTR_TYPE_LOCAL,
> +	/* Memory allocated dynamically by the kernel for the dynptr */
> +	BPF_DYNPTR_TYPE_MALLOC,
> +};
> +
> +/* Since the upper 8 bits of dynptr->size is reserved, the
> + * maximum supported size is 2^24 - 1.
> + */
> +#define DYNPTR_MAX_SIZE	((1UL << 24) - 1)
> +#define DYNPTR_SIZE_MASK	0xFFFFFF
> +#define DYNPTR_TYPE_SHIFT	28
> +#define DYNPTR_TYPE_MASK	0x7
> +
> +static inline enum bpf_dynptr_type bpf_dynptr_get_type(struct bpf_dynptr_kern *ptr)
> +{
> +	return (ptr->size >> DYNPTR_TYPE_SHIFT) & DYNPTR_TYPE_MASK;
> +}
> +
> +static inline void bpf_dynptr_set_type(struct bpf_dynptr_kern *ptr, enum bpf_dynptr_type type)
> +{
> +	ptr->size |= type << DYNPTR_TYPE_SHIFT;
> +}
> +
> +static inline u32 bpf_dynptr_get_size(struct bpf_dynptr_kern *ptr)
> +{
> +	return ptr->size & DYNPTR_SIZE_MASK;
> +}
> +
> +static inline int bpf_dynptr_check_size(u32 size)
> +{
> +	return size > DYNPTR_MAX_SIZE ? -E2BIG : 0;
> +}
> +
> +void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data, enum bpf_dynptr_type type,
> +		     u32 offset, u32 size);
> +
> +void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
> +
>  #endif /* _LINUX_BPF_H */
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 7a01adc9e13f..e11440a44e92 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -72,6 +72,27 @@ struct bpf_reg_state {
>  
>  		u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */
>  
> +		/* For dynptr stack slots */
> +		struct {
> +			enum bpf_dynptr_type type;
> +			/* A dynptr is 16 bytes so it takes up 2 stack slots.
> +			 * We need to track which slot is the first slot
> +			 * to protect against cases where the user may try to
> +			 * pass in an address starting at the second slot of the
> +			 * dynptr.
> +			 */
> +			bool first_slot;
> +		} dynptr;
> +		/* For stack slots that a local dynptr points to. We need to track
> +		 * this to prohibit programs from using stack variables that are
> +		 * pointed to by dynptrs as a dynptr, eg something like
> +		 *
> +		 * bpf_dyntpr_from_mem(&ptr, sizeof(ptr), 0, &local);
> +		 * bpf_dynptr_alloc(16, 0, &ptr);
> +		 * bpf_dynptr_write(&local, 0, corrupt_data, sizeof(ptr));
> +		 */
> +		bool is_dynptr_data;
> +
>  		/* Max size from any of the above. */
>  		struct {
>  			unsigned long raw1;
> @@ -174,9 +195,16 @@ enum bpf_stack_slot_type {
>  	STACK_SPILL,      /* register spilled into stack */
>  	STACK_MISC,	  /* BPF program wrote some data into this slot */
>  	STACK_ZERO,	  /* BPF program wrote constant zero */
> +	/* A dynptr is stored in this stack slot. The type of dynptr
> +	 * is stored in bpf_stack_state->spilled_ptr.dynptr.type
> +	 */
> +	STACK_DYNPTR,
>  };
>  
>  #define BPF_REG_SIZE 8	/* size of eBPF register in bytes */
> +/* size of a struct bpf_dynptr in bytes */
> +#define BPF_DYNPTR_SIZE sizeof(struct bpf_dynptr_kern)
> +#define BPF_DYNPTR_NR_SLOTS (BPF_DYNPTR_SIZE / BPF_REG_SIZE)
>  
>  struct bpf_stack_state {
>  	struct bpf_reg_state spilled_ptr;
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index d14b10b85e51..e339b2697d9a 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5143,6 +5143,42 @@ union bpf_attr {
>   *		The **hash_algo** is returned on success,
>   *		**-EOPNOTSUP** if the hash calculation failed or **-EINVAL** if
>   *		invalid arguments are passed.
> + *
> + * long bpf_dynptr_from_mem(void *data, u32 size, u64 flags, struct bpf_dynptr *ptr)
> + *	Description
> + *		Get a dynptr to local memory *data*.
> + *
> + *		For a dynptr to a dynamic memory allocation, please use
> + *		bpf_dynptr_alloc instead.
> + *
> + *		The maximum *size* supported is DYNPTR_MAX_SIZE.
> + *		*flags* is currently unused.
> + *	Return
> + *		0 on success, -E2BIG if the size exceeds DYNPTR_MAX_SIZE,
> + *		-EINVAL if flags is not 0.
> + *
> + * long bpf_dynptr_alloc(u32 size, u64 flags, struct bpf_dynptr *ptr)
> + *	Description
> + *		Allocate memory of *size* bytes.
> + *
> + *		Every call to bpf_dynptr_alloc must have a corresponding
> + *		bpf_dynptr_put, regardless of whether the bpf_dynptr_alloc
> + *		succeeded.
> + *
> + *		The maximum *size* supported is DYNPTR_MAX_SIZE.
> + *		Supported *flags* are __GFP_ZERO.
> + *	Return
> + *		0 on success, -ENOMEM if there is not enough memory for the
> + *		allocation, -E2BIG if the size exceeds DYNPTR_MAX_SIZE, -EINVAL
> + *		if the flags is not supported.
> + *
> + * void bpf_dynptr_put(struct bpf_dynptr *ptr)
> + *	Description
> + *		Free memory allocated by bpf_dynptr_alloc.
> + *
> + *		After this operation, *ptr* will be an invalidated dynptr.
> + *	Return
> + *		Void.
>   */
>  #define __BPF_FUNC_MAPPER(FN)		\
>  	FN(unspec),			\
> @@ -5339,6 +5375,9 @@ union bpf_attr {
>  	FN(copy_from_user_task),	\
>  	FN(skb_set_tstamp),		\
>  	FN(ima_file_hash),		\
> +	FN(dynptr_from_mem),		\
> +	FN(dynptr_alloc),		\
> +	FN(dynptr_put),			\
>  	/* */
>  
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> @@ -6486,6 +6525,11 @@ struct bpf_timer {
>  	__u64 :64;
>  } __attribute__((aligned(8)));
>  
> +struct bpf_dynptr {
> +	__u64 :64;
> +	__u64 :64;
> +} __attribute__((aligned(8)));
> +
>  struct bpf_sysctl {
>  	__u32	write;		/* Sysctl is being read (= 0) or written (= 1).
>  				 * Allows 1,2,4-byte read, but no write.
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index a47aae5c7335..87c14edda315 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -1374,6 +1374,110 @@ void bpf_timer_cancel_and_free(void *val)
>  	kfree(t);
>  }
>  
> +void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data, enum bpf_dynptr_type type,
> +		     u32 offset, u32 size)
> +{
> +	ptr->data = data;
> +	ptr->offset = offset;
> +	ptr->size = size;
> +	bpf_dynptr_set_type(ptr, type);
> +}
> +
> +void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr)
> +{
> +	memset(ptr, 0, sizeof(*ptr));
> +}
> +
> +BPF_CALL_4(bpf_dynptr_from_mem, void *, data, u32, size, u64, flags, struct bpf_dynptr_kern *, ptr)
> +{
> +	int err;
> +
> +	err = bpf_dynptr_check_size(size);
> +	if (err)
> +		goto error;
> +
> +	/* flags is currently unsupported */
> +	if (flags) {
> +		err = -EINVAL;
> +		goto error;
> +	}
> +
> +	bpf_dynptr_init(ptr, data, BPF_DYNPTR_TYPE_LOCAL, 0, size);
> +
> +	return 0;
> +
> +error:
> +	bpf_dynptr_set_null(ptr);
> +	return err;
> +}
> +
> +const struct bpf_func_proto bpf_dynptr_from_mem_proto = {
> +	.func		= bpf_dynptr_from_mem,
> +	.gpl_only	= false,
> +	.ret_type	= RET_INTEGER,
> +	.arg1_type	= ARG_PTR_TO_MEM_UNINIT | NO_OBJ_REF,
> +	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
> +	.arg3_type	= ARG_ANYTHING,
> +	.arg4_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL | MEM_UNINIT,
> +};
> +
> +BPF_CALL_3(bpf_dynptr_alloc, u32, size, u64, flags, struct bpf_dynptr_kern *, ptr)
> +{
> +	gfp_t gfp_flags = GFP_ATOMIC;
> +	void *data;
> +	int err;
> +
> +	err = bpf_dynptr_check_size(size);
> +	if (err)
> +		goto error;
> +
> +	if (flags) {
> +		if (flags == __GFP_ZERO) {
> +			gfp_flags |= flags;
> +		} else {
> +			err = -EINVAL;
> +			goto error;
> +		}
> +	}
> +
> +	data = kmalloc(size, gfp_flags);
> +	if (!data) {
> +		err = -ENOMEM;
> +		goto error;
> +	}
> +
> +	bpf_dynptr_init(ptr, data, BPF_DYNPTR_TYPE_MALLOC, 0, size);
> +
> +	return 0;
> +
> +error:
> +	bpf_dynptr_set_null(ptr);
> +	return err;
> +}
> +
> +const struct bpf_func_proto bpf_dynptr_alloc_proto = {
> +	.func		= bpf_dynptr_alloc,
> +	.gpl_only	= false,
> +	.ret_type	= RET_INTEGER,
> +	.arg1_type	= ARG_ANYTHING,
> +	.arg2_type	= ARG_ANYTHING,
> +	.arg3_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_MALLOC | MEM_UNINIT,
> +};
> +
> +BPF_CALL_1(bpf_dynptr_put, struct bpf_dynptr_kern *, dynptr)
> +{
> +	kfree(dynptr->data);
> +	bpf_dynptr_set_null(dynptr);
> +	return 0;
> +}
> +
> +const struct bpf_func_proto bpf_dynptr_put_proto = {
> +	.func		= bpf_dynptr_put,
> +	.gpl_only	= false,
> +	.ret_type	= RET_VOID,
> +	.arg1_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_MALLOC | OBJ_RELEASE,
> +};
> +
>  const struct bpf_func_proto bpf_get_current_task_proto __weak;
>  const struct bpf_func_proto bpf_get_current_task_btf_proto __weak;
>  const struct bpf_func_proto bpf_probe_read_user_proto __weak;
> @@ -1426,6 +1530,12 @@ bpf_base_func_proto(enum bpf_func_id func_id)
>  		return &bpf_loop_proto;
>  	case BPF_FUNC_strncmp:
>  		return &bpf_strncmp_proto;
> +	case BPF_FUNC_dynptr_from_mem:
> +		return &bpf_dynptr_from_mem_proto;
> +	case BPF_FUNC_dynptr_alloc:
> +		return &bpf_dynptr_alloc_proto;
> +	case BPF_FUNC_dynptr_put:
> +		return &bpf_dynptr_put_proto;
>  	default:
>  		break;
>  	}
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 8deb588a19ce..bf132c6822e4 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -187,6 +187,9 @@ struct bpf_verifier_stack_elem {
>  					  POISON_POINTER_DELTA))
>  #define BPF_MAP_PTR(X)		((struct bpf_map *)((X) & ~BPF_MAP_PTR_UNPRIV))
>  
> +/* forward declarations */
> +static bool arg_type_is_mem_size(enum bpf_arg_type type);
> +
>  static bool bpf_map_ptr_poisoned(const struct bpf_insn_aux_data *aux)
>  {
>  	return BPF_MAP_PTR(aux->map_ptr_state) == BPF_MAP_PTR_POISON;
> @@ -257,7 +260,9 @@ struct bpf_call_arg_meta {
>  	struct btf *ret_btf;
>  	u32 ret_btf_id;
>  	u32 subprogno;
> -	bool release_ref;
> +	u8 release_regno;
> +	bool release_dynptr;
> +	u8 uninit_dynptr_regno;
>  };
>  
>  struct btf *btf_vmlinux;
> @@ -576,6 +581,7 @@ static char slot_type_char[] = {
>  	[STACK_SPILL]	= 'r',
>  	[STACK_MISC]	= 'm',
>  	[STACK_ZERO]	= '0',
> +	[STACK_DYNPTR]	= 'd',
>  };
>  
>  static void print_liveness(struct bpf_verifier_env *env,
> @@ -591,6 +597,25 @@ static void print_liveness(struct bpf_verifier_env *env,
>  		verbose(env, "D");
>  }
>  
> +static inline int get_spi(s32 off)
> +{
> +	return (-off - 1) / BPF_REG_SIZE;
> +}
> +
> +static bool is_spi_bounds_valid(struct bpf_func_state *state, int spi, u32 nr_slots)
> +{
> +	int allocated_slots = state->allocated_stack / BPF_REG_SIZE;
> +
> +	/* We need to check that slots between [spi - nr_slots + 1, spi] are
> +	 * within [0, allocated_stack).
> +	 *
> +	 * Please note that the spi grows downwards. For example, a dynptr
> +	 * takes the size of two stack slots; the first slot will be at
> +	 * spi and the second slot will be at spi - 1.
> +	 */
> +	return spi - nr_slots + 1 >= 0 && spi < allocated_slots;
> +}
> +
>  static struct bpf_func_state *func(struct bpf_verifier_env *env,
>  				   const struct bpf_reg_state *reg)
>  {
> @@ -642,6 +667,191 @@ static void mark_verifier_state_scratched(struct bpf_verifier_env *env)
>  	env->scratched_stack_slots = ~0ULL;
>  }
>  
> +#define DYNPTR_TYPE_FLAG_MASK (DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_MALLOC)
> +
> +static int arg_to_dynptr_type(enum bpf_arg_type arg_type)
> +{
> +	switch (arg_type & DYNPTR_TYPE_FLAG_MASK) {
> +	case DYNPTR_TYPE_LOCAL:
> +		return BPF_DYNPTR_TYPE_LOCAL;
> +	case DYNPTR_TYPE_MALLOC:
> +		return BPF_DYNPTR_TYPE_MALLOC;
> +	default:
> +		return BPF_DYNPTR_TYPE_INVALID;
> +	}
> +}
> +
> +static inline bool dynptr_type_refcounted(enum bpf_dynptr_type type)
> +{
> +	return type == BPF_DYNPTR_TYPE_MALLOC;
> +}
> +
> +static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> +				   enum bpf_arg_type arg_type)
> +{
> +	struct bpf_func_state *state = cur_func(env);
> +	enum bpf_dynptr_type type;
> +	int spi, i;
> +
> +	spi = get_spi(reg->off);
> +
> +	if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS))
> +		return -EINVAL;
> +
> +	type = arg_to_dynptr_type(arg_type);
> +	if (type == BPF_DYNPTR_TYPE_INVALID)
> +		return -EINVAL;
> +
> +	for (i = 0; i < BPF_REG_SIZE; i++) {
> +		state->stack[spi].slot_type[i] = STACK_DYNPTR;
> +		state->stack[spi - 1].slot_type[i] = STACK_DYNPTR;
> +	}
> +
> +	state->stack[spi].spilled_ptr.dynptr.type = type;
> +	state->stack[spi - 1].spilled_ptr.dynptr.type = type;
> +
> +	state->stack[spi].spilled_ptr.dynptr.first_slot = true;
> +
> +	return 0;
> +}
> +
> +static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> +{
> +	struct bpf_func_state *state = func(env, reg);
> +	int spi, i;
> +
> +	spi = get_spi(reg->off);
> +
> +	if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS))
> +		return -EINVAL;
> +
> +	for (i = 0; i < BPF_REG_SIZE; i++) {
> +		state->stack[spi].slot_type[i] = STACK_INVALID;
> +		state->stack[spi - 1].slot_type[i] = STACK_INVALID;
> +	}
> +
> +	state->stack[spi].spilled_ptr.dynptr.type = 0;
> +	state->stack[spi - 1].spilled_ptr.dynptr.type = 0;
> +
> +	state->stack[spi].spilled_ptr.dynptr.first_slot = 0;
> +
> +	return 0;
> +}
> +
> +static int mark_as_dynptr_data(struct bpf_verifier_env *env, const struct bpf_func_proto *fn,
> +			       struct bpf_reg_state *regs)
> +{
> +	struct bpf_func_state *state = cur_func(env);
> +	struct bpf_reg_state *reg, *mem_reg = NULL;
> +	enum bpf_arg_type arg_type;
> +	u64 mem_size;
> +	u32 nr_slots;
> +	int i, spi;
> +
> +	/* We must protect against the case where a program tries to do something
> +	 * like this:
> +	 *
> +	 * bpf_dynptr_from_mem(&ptr, sizeof(ptr), 0, &local);
> +	 * bpf_dynptr_alloc(16, 0, &ptr);
> +	 * bpf_dynptr_write(&local, 0, corrupt_data, sizeof(ptr));
> +	 *
> +	 * If ptr is a variable on the stack, we must mark the stack slot as
> +	 * dynptr data when a local dynptr to it is created.
> +	 */
> +	for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
> +		arg_type = fn->arg_type[i];
> +		reg = &regs[BPF_REG_1 + i];
> +
> +		if (base_type(arg_type) == ARG_PTR_TO_MEM) {
> +			if (base_type(reg->type) == PTR_TO_STACK) {
> +				mem_reg = reg;
> +				continue;
> +			}
> +			/* if it's not a PTR_TO_STACK, then we don't need to
> +			 * mark anything since it can never be used as a dynptr.
> +			 * We can just return here since there will always be
> +			 * only one ARG_PTR_TO_MEM in fn.
> +			 */
> +			return 0;

I think the assumption here that NO_OBJ_REF flag reduces ARG_PTR_TO_MEM
to be stack, a pointer to packet or map value, right?
Since dynptr can only be on stack, map value and packet memory
cannot be used to store dynptr.
So bpf_dynptr_alloc(16, 0, &ptr); is not possible where &ptr
points to packet or map value?
So that's what 'return 0' above doing?
That's probably ok.

Just thinking out loud:
bpf_dynptr_from_mem(&ptr, sizeof(ptr), 0, &local);
where &local is a dynptr on stack, but &ptr is a map value?
The lifetime of the memory tracked by dynptr is not going
to outlive program execution.
Probably ok too.

> +		} else if (arg_type_is_mem_size(arg_type)) {
> +			mem_size = roundup(reg->var_off.value, BPF_REG_SIZE);
> +		}
> +	}
> +
> +	if (!mem_reg || !mem_size) {
> +		verbose(env, "verifier internal error: invalid ARG_PTR_TO_MEM args for %s\n", __func__);
> +		return -EFAULT;
> +	}
> +
> +	spi = get_spi(mem_reg->off);
> +	if (!is_spi_bounds_valid(state, spi, mem_size)) {
> +		verbose(env, "verifier internal error: variable not initialized on stack in %s\n", __func__);
> +		return -EFAULT;
> +	}
> +
> +	nr_slots = mem_size / BPF_REG_SIZE;
> +	for (i = 0; i < nr_slots; i++)
> +		state->stack[spi - i].spilled_ptr.is_dynptr_data = true;

So the stack is still STACK_INVALID potentially,
but we mark it as is_dynptr_data...
but the data doesn't need to be 8-byte (spill_ptr) aligned.
So the above loop will mark more stack slots as busy then
the actual stack memory the dynptr points to.
Probably ok.
The stack size is just 512. I wonder whether all this complexity
with tracking special stack memory is worth it.
May be restrict dynptr_from_mem to point to non-stack PTR_TO_MEM only?

> +
> +	return 0;
> +}
> +
> +static bool is_dynptr_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> +				       bool *is_dynptr_data)
> +{
> +	struct bpf_func_state *state = func(env, reg);
> +	int spi;
> +
> +	spi = get_spi(reg->off);
> +
> +	if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS))
> +		return true;
> +
> +	if (state->stack[spi].slot_type[0] == STACK_DYNPTR ||
> +	    state->stack[spi - 1].slot_type[0] == STACK_DYNPTR)
> +		return false;
> +
> +	if (state->stack[spi].spilled_ptr.is_dynptr_data ||
> +	    state->stack[spi - 1].spilled_ptr.is_dynptr_data) {
> +		*is_dynptr_data = true;
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +static bool is_dynptr_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> +				     enum bpf_arg_type arg_type)
> +{
> +	struct bpf_func_state *state = func(env, reg);
> +	int spi = get_spi(reg->off);
> +
> +	if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS) ||
> +	    state->stack[spi].slot_type[0] != STACK_DYNPTR ||
> +	    state->stack[spi - 1].slot_type[0] != STACK_DYNPTR ||
> +	    !state->stack[spi].spilled_ptr.dynptr.first_slot)
> +		return false;
> +
> +	/* ARG_PTR_TO_DYNPTR takes any type of dynptr */
> +	if (arg_type == ARG_PTR_TO_DYNPTR)
> +		return true;
> +
> +	return state->stack[spi].spilled_ptr.dynptr.type == arg_to_dynptr_type(arg_type);
> +}
> +
> +static bool stack_access_into_dynptr(struct bpf_func_state *state, int spi, int size)
> +{
> +	int nr_slots = roundup(size, BPF_REG_SIZE) / BPF_REG_SIZE;
> +	int i;
> +
> +	for (i = 0; i < nr_slots; i++) {
> +		if (state->stack[spi - i].slot_type[0] == STACK_DYNPTR)
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
>  /* The reg state of a pointer or a bounded scalar was saved when
>   * it was spilled to the stack.
>   */
> @@ -2878,6 +3088,12 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
>  	}
>  
>  	mark_stack_slot_scratched(env, spi);
> +
> +	if (stack_access_into_dynptr(state, spi, size)) {
> +		verbose(env, "direct write into dynptr is not permitted\n");
> +		return -EINVAL;
> +	}
> +
>  	if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) &&
>  	    !register_is_null(reg) && env->bpf_capable) {
>  		if (dst_reg != BPF_REG_FP) {
> @@ -2999,6 +3215,12 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env,
>  		slot = -i - 1;
>  		spi = slot / BPF_REG_SIZE;
>  		stype = &state->stack[spi].slot_type[slot % BPF_REG_SIZE];
> +
> +		if (*stype == STACK_DYNPTR) {
> +			verbose(env, "direct write into dynptr is not permitted\n");
> +			return -EINVAL;
> +		}
> +
>  		mark_stack_slot_scratched(env, spi);
>  
>  		if (!env->allow_ptr_leaks
> @@ -5141,6 +5363,16 @@ static bool arg_type_is_int_ptr(enum bpf_arg_type type)
>  	       type == ARG_PTR_TO_LONG;
>  }
>  
> +static inline bool arg_type_is_dynptr(enum bpf_arg_type type)
> +{
> +	return base_type(type) == ARG_PTR_TO_DYNPTR;
> +}
> +
> +static inline bool arg_type_is_dynptr_uninit(enum bpf_arg_type type)
> +{
> +	return arg_type_is_dynptr(type) && (type & MEM_UNINIT);
> +}
> +
>  static int int_ptr_type_to_size(enum bpf_arg_type type)
>  {
>  	if (type == ARG_PTR_TO_INT)
> @@ -5278,6 +5510,7 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
>  	[ARG_PTR_TO_STACK]		= &stack_ptr_types,
>  	[ARG_PTR_TO_CONST_STR]		= &const_str_ptr_types,
>  	[ARG_PTR_TO_TIMER]		= &timer_types,
> +	[ARG_PTR_TO_DYNPTR]		= &stack_ptr_types,
>  };
>  
>  static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
> @@ -5450,10 +5683,16 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
>  		return err;
>  
>  skip_type_check:
> -	/* check_func_arg_reg_off relies on only one referenced register being
> -	 * allowed for BPF helpers.
> -	 */
>  	if (reg->ref_obj_id) {
> +		if (arg_type & NO_OBJ_REF) {
> +			verbose(env, "Arg #%d cannot be a referenced object\n",
> +				arg + 1);
> +			return -EINVAL;
> +		}
> +
> +		/* check_func_arg_reg_off relies on only one referenced register being
> +		 * allowed for BPF helpers.
> +		 */
>  		if (meta->ref_obj_id) {
>  			verbose(env, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n",
>  				regno, reg->ref_obj_id,
> @@ -5463,16 +5702,26 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
>  		meta->ref_obj_id = reg->ref_obj_id;
>  	}
>  	if (arg_type & OBJ_RELEASE) {
> -		if (!reg->ref_obj_id) {
> +		if (arg_type_is_dynptr(arg_type)) {
> +			struct bpf_func_state *state = func(env, reg);
> +			int spi = get_spi(reg->off);
> +
> +			if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS) ||
> +			    !state->stack[spi].spilled_ptr.id) {
> +				verbose(env, "arg %d is an unacquired reference\n", regno);
> +				return -EINVAL;
> +			}
> +			meta->release_dynptr = true;
> +		} else if (!reg->ref_obj_id) {
>  			verbose(env, "arg %d is an unacquired reference\n", regno);
>  			return -EINVAL;
>  		}
> -		if (meta->release_ref) {
> -			verbose(env, "verifier internal error: more than one release_ref arg R%d\n",
> -				regno);
> +		if (meta->release_regno) {
> +			verbose(env, "verifier internal error: more than one release_regno %u %u\n",
> +				meta->release_regno, regno);
>  			return -EFAULT;
>  		}
> -		meta->release_ref = true;
> +		meta->release_regno = regno;
>  	}
>  
>  	if (arg_type == ARG_CONST_MAP_PTR) {
> @@ -5565,6 +5814,44 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
>  		bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
>  
>  		err = check_mem_size_reg(env, reg, regno, zero_size_allowed, meta);
> +	} else if (arg_type_is_dynptr(arg_type)) {
> +		/* Can't pass in a dynptr at a weird offset */
> +		if (reg->off % BPF_REG_SIZE) {
> +			verbose(env, "cannot pass in non-zero dynptr offset\n");
> +			return -EINVAL;
> +		}
> +
> +		if (arg_type & MEM_UNINIT)  {
> +			bool is_dynptr_data = false;
> +
> +			if (!is_dynptr_reg_valid_uninit(env, reg, &is_dynptr_data)) {
> +				if (is_dynptr_data)
> +					verbose(env, "Arg #%d cannot be a memory reference for another dynptr\n",
> +						arg + 1);
> +				else
> +					verbose(env, "Arg #%d dynptr has to be an uninitialized dynptr\n",
> +						arg + 1);
> +				return -EINVAL;
> +			}
> +
> +			meta->uninit_dynptr_regno = arg + BPF_REG_1;
> +		} else if (!is_dynptr_reg_valid_init(env, reg, arg_type)) {
> +			const char *err_extra = "";
> +
> +			switch (arg_type & DYNPTR_TYPE_FLAG_MASK) {
> +			case DYNPTR_TYPE_LOCAL:
> +				err_extra = "local ";
> +				break;
> +			case DYNPTR_TYPE_MALLOC:
> +				err_extra = "malloc ";
> +				break;
> +			default:
> +				break;
> +			}
> +			verbose(env, "Expected an initialized %sdynptr as arg #%d\n",
> +				err_extra, arg + 1);
> +			return -EINVAL;
> +		}
>  	} else if (arg_type_is_alloc_size(arg_type)) {
>  		if (!tnum_is_const(reg->var_off)) {
>  			verbose(env, "R%d is not a known constant'\n",
> @@ -6545,6 +6832,28 @@ static int check_reference_leak(struct bpf_verifier_env *env)
>  	return state->acquired_refs ? -EINVAL : 0;
>  }
>  
> +/* Called at BPF_EXIT to detect if there are any reference-tracked dynptrs that have
> + * not been released. Dynptrs to local memory do not need to be released.
> + */
> +static int check_dynptr_unreleased(struct bpf_verifier_env *env)
> +{
> +	struct bpf_func_state *state = cur_func(env);
> +	int allocated_slots, i;
> +
> +	allocated_slots = state->allocated_stack / BPF_REG_SIZE;
> +
> +	for (i = 0; i < allocated_slots; i++) {
> +		if (state->stack[i].slot_type[0] == STACK_DYNPTR) {
> +			if (dynptr_type_refcounted(state->stack[i].spilled_ptr.dynptr.type)) {
> +				verbose(env, "spi=%d is an unreleased dynptr\n", i);
> +				return -EINVAL;
> +			}
> +		}
> +	}

I guess it's ok to treat refcnted dynptr special like above.
I wonder whether we can reuse check_reference_leak logic?

> +
> +	return 0;
> +}
> +
>  static int check_bpf_snprintf_call(struct bpf_verifier_env *env,
>  				   struct bpf_reg_state *regs)
>  {
> @@ -6686,8 +6995,38 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>  			return err;
>  	}
>  
> -	if (meta.release_ref) {
> -		err = release_reference(env, meta.ref_obj_id);
> +	regs = cur_regs(env);
> +
> +	if (meta.uninit_dynptr_regno) {
> +		enum bpf_arg_type type;
> +
> +		/* we write BPF_W bits (4 bytes) at a time */
> +		for (i = 0; i < BPF_DYNPTR_SIZE; i += 4) {
> +			err = check_mem_access(env, insn_idx, meta.uninit_dynptr_regno,
> +					       i, BPF_W, BPF_WRITE, -1, false);

Why 4 byte at a time?
dynptr has an 8 byte pointer in there.

> +			if (err)
> +				return err;
> +		}
> +
> +		type = fn->arg_type[meta.uninit_dynptr_regno - BPF_REG_1];
> +
> +		err = mark_stack_slots_dynptr(env, &regs[meta.uninit_dynptr_regno], type);
> +		if (err)
> +			return err;
> +
> +		if (type & DYNPTR_TYPE_LOCAL) {
> +			err = mark_as_dynptr_data(env, fn, regs);
> +			if (err)
> +				return err;
> +		}
> +	}
> +
> +	if (meta.release_regno) {
> +		if (meta.release_dynptr) {
> +			err = unmark_stack_slots_dynptr(env, &regs[meta.release_regno]);
> +		} else {
> +			err = release_reference(env, meta.ref_obj_id);
> +		}
>  		if (err) {
>  			verbose(env, "func %s#%d reference has not been acquired before\n",
>  				func_id_name(func_id), func_id);
> @@ -6695,8 +7034,6 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>  		}
>  	}
>  
> -	regs = cur_regs(env);
> -
>  	switch (func_id) {
>  	case BPF_FUNC_tail_call:
>  		err = check_reference_leak(env);
> @@ -6704,6 +7041,11 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>  			verbose(env, "tail_call would lead to reference leak\n");
>  			return err;
>  		}
> +		err = check_dynptr_unreleased(env);
> +		if (err) {
> +			verbose(env, "tail_call would lead to dynptr memory leak\n");
> +			return err;
> +		}
>  		break;
>  	case BPF_FUNC_get_local_storage:
>  		/* check that flags argument in get_local_storage(map, flags) is 0,
> @@ -11696,6 +12038,10 @@ static int do_check(struct bpf_verifier_env *env)
>  					return -EINVAL;
>  				}
>  
> +				err = check_dynptr_unreleased(env);
> +				if (err)
> +					return err;
> +
>  				if (state->curframe) {
>  					/* exit from nested function */
>  					err = prepare_func_exit(env, &env->insn_idx);
> diff --git a/scripts/bpf_doc.py b/scripts/bpf_doc.py
> index 096625242475..766dcbc73897 100755
> --- a/scripts/bpf_doc.py
> +++ b/scripts/bpf_doc.py
> @@ -633,6 +633,7 @@ class PrinterHelpers(Printer):
>              'struct socket',
>              'struct file',
>              'struct bpf_timer',
> +            'struct bpf_dynptr',
>      ]
>      known_types = {
>              '...',
> @@ -682,6 +683,7 @@ class PrinterHelpers(Printer):
>              'struct socket',
>              'struct file',
>              'struct bpf_timer',
> +            'struct bpf_dynptr',
>      }
>      mapped_types = {
>              'u8': '__u8',
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index d14b10b85e51..e339b2697d9a 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -5143,6 +5143,42 @@ union bpf_attr {
>   *		The **hash_algo** is returned on success,
>   *		**-EOPNOTSUP** if the hash calculation failed or **-EINVAL** if
>   *		invalid arguments are passed.
> + *
> + * long bpf_dynptr_from_mem(void *data, u32 size, u64 flags, struct bpf_dynptr *ptr)
> + *	Description
> + *		Get a dynptr to local memory *data*.
> + *
> + *		For a dynptr to a dynamic memory allocation, please use
> + *		bpf_dynptr_alloc instead.
> + *
> + *		The maximum *size* supported is DYNPTR_MAX_SIZE.
> + *		*flags* is currently unused.
> + *	Return
> + *		0 on success, -E2BIG if the size exceeds DYNPTR_MAX_SIZE,
> + *		-EINVAL if flags is not 0.
> + *
> + * long bpf_dynptr_alloc(u32 size, u64 flags, struct bpf_dynptr *ptr)
> + *	Description
> + *		Allocate memory of *size* bytes.
> + *
> + *		Every call to bpf_dynptr_alloc must have a corresponding
> + *		bpf_dynptr_put, regardless of whether the bpf_dynptr_alloc
> + *		succeeded.
> + *
> + *		The maximum *size* supported is DYNPTR_MAX_SIZE.
> + *		Supported *flags* are __GFP_ZERO.
> + *	Return
> + *		0 on success, -ENOMEM if there is not enough memory for the
> + *		allocation, -E2BIG if the size exceeds DYNPTR_MAX_SIZE, -EINVAL
> + *		if the flags is not supported.
> + *
> + * void bpf_dynptr_put(struct bpf_dynptr *ptr)
> + *	Description
> + *		Free memory allocated by bpf_dynptr_alloc.
> + *
> + *		After this operation, *ptr* will be an invalidated dynptr.
> + *	Return
> + *		Void.
>   */
>  #define __BPF_FUNC_MAPPER(FN)		\
>  	FN(unspec),			\
> @@ -5339,6 +5375,9 @@ union bpf_attr {
>  	FN(copy_from_user_task),	\
>  	FN(skb_set_tstamp),		\
>  	FN(ima_file_hash),		\
> +	FN(dynptr_from_mem),		\
> +	FN(dynptr_alloc),		\
> +	FN(dynptr_put),			\
>  	/* */
>  
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> @@ -6486,6 +6525,11 @@ struct bpf_timer {
>  	__u64 :64;
>  } __attribute__((aligned(8)));
>  
> +struct bpf_dynptr {
> +	__u64 :64;
> +	__u64 :64;
> +} __attribute__((aligned(8)));
> +
>  struct bpf_sysctl {
>  	__u32	write;		/* Sysctl is being read (= 0) or written (= 1).
>  				 * Allows 1,2,4-byte read, but no write.
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v2 3/7] bpf: Add bpf_dynptr_from_mem, bpf_dynptr_alloc, bpf_dynptr_put
  2022-04-22  2:52   ` Alexei Starovoitov
@ 2022-04-26 23:45     ` Joanne Koong
  2022-04-27  1:26       ` Alexei Starovoitov
  2022-04-27  3:48       ` Andrii Nakryiko
  0 siblings, 2 replies; 27+ messages in thread
From: Joanne Koong @ 2022-04-26 23:45 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Thu, Apr 21, 2022 at 7:52 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Apr 15, 2022 at 11:34:25PM -0700, Joanne Koong wrote:
> > This patch adds 3 new APIs and the bulk of the verifier work for
> > supporting dynamic pointers in bpf.
> >
> > There are different types of dynptrs. This patch starts with the most
> > basic ones, ones that reference a program's local memory
> > (eg a stack variable) and ones that reference memory that is dynamically
> > allocated on behalf of the program. If the memory is dynamically
> > allocated by the program, the program *must* free it before the program
> > exits. This is enforced by the verifier.
> >
> > The added APIs are:
> >
> > long bpf_dynptr_from_mem(void *data, u32 size, u64 flags, struct bpf_dynptr *ptr);
> > long bpf_dynptr_alloc(u32 size, u64 flags, struct bpf_dynptr *ptr);
> > void bpf_dynptr_put(struct bpf_dynptr *ptr);
> >
> > This patch sets up the verifier to support dynptrs. Dynptrs will always
> > reside on the program's stack frame. As such, their state is tracked
> > in their corresponding stack slot, which includes the type of dynptr
> > (DYNPTR_LOCAL vs. DYNPTR_MALLOC).
> >
> > When the program passes in an uninitialized dynptr (ARG_PTR_TO_DYNPTR |
> > MEM_UNINIT), the stack slots corresponding to the frame pointer
> > where the dynptr resides at are marked as STACK_DYNPTR. For helper functions
> > that take in initialized dynptrs (such as the next patch in this series
> > which supports dynptr reads/writes), the verifier enforces that the
> > dynptr has been initialized by checking that their corresponding stack
> > slots have been marked as STACK_DYNPTR. Dynptr release functions
> > (eg bpf_dynptr_put) will clear the stack slots. The verifier enforces at
> > program exit that there are no acquired dynptr stack slots that need
> > to be released.
> >
> > There are other constraints that are enforced by the verifier as
> > well, such as that the dynptr cannot be written to directly by the bpf
> > program or by non-dynptr helper functions. The last patch in this series
> > contains tests that trigger different cases that the verifier needs to
> > successfully reject.
> >
> > For now, local dynptrs cannot point to referenced memory since the
> > memory can be freed anytime. Support for this will be added as part
> > of a separate patchset.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
> >  include/linux/bpf.h            |  68 +++++-
> >  include/linux/bpf_verifier.h   |  28 +++
> >  include/uapi/linux/bpf.h       |  44 ++++
> >  kernel/bpf/helpers.c           | 110 ++++++++++
> >  kernel/bpf/verifier.c          | 372 +++++++++++++++++++++++++++++++--
> >  scripts/bpf_doc.py             |   2 +
> >  tools/include/uapi/linux/bpf.h |  44 ++++
> >  7 files changed, 654 insertions(+), 14 deletions(-)
> >
[...]
> > +     for (i = 0; i < BPF_REG_SIZE; i++) {
> > +             state->stack[spi].slot_type[i] = STACK_INVALID;
> > +             state->stack[spi - 1].slot_type[i] = STACK_INVALID;
> > +     }
> > +
> > +     state->stack[spi].spilled_ptr.dynptr.type = 0;
> > +     state->stack[spi - 1].spilled_ptr.dynptr.type = 0;
> > +
> > +     state->stack[spi].spilled_ptr.dynptr.first_slot = 0;
> > +
> > +     return 0;
> > +}
> > +
> > +static int mark_as_dynptr_data(struct bpf_verifier_env *env, const struct bpf_func_proto *fn,
> > +                            struct bpf_reg_state *regs)
> > +{
> > +     struct bpf_func_state *state = cur_func(env);
> > +     struct bpf_reg_state *reg, *mem_reg = NULL;
> > +     enum bpf_arg_type arg_type;
> > +     u64 mem_size;
> > +     u32 nr_slots;
> > +     int i, spi;
> > +
> > +     /* We must protect against the case where a program tries to do something
> > +      * like this:
> > +      *
> > +      * bpf_dynptr_from_mem(&ptr, sizeof(ptr), 0, &local);
> > +      * bpf_dynptr_alloc(16, 0, &ptr);
> > +      * bpf_dynptr_write(&local, 0, corrupt_data, sizeof(ptr));
> > +      *
> > +      * If ptr is a variable on the stack, we must mark the stack slot as
> > +      * dynptr data when a local dynptr to it is created.
> > +      */
> > +     for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
> > +             arg_type = fn->arg_type[i];
> > +             reg = &regs[BPF_REG_1 + i];
> > +
> > +             if (base_type(arg_type) == ARG_PTR_TO_MEM) {
> > +                     if (base_type(reg->type) == PTR_TO_STACK) {
> > +                             mem_reg = reg;
> > +                             continue;
> > +                     }
> > +                     /* if it's not a PTR_TO_STACK, then we don't need to
> > +                      * mark anything since it can never be used as a dynptr.
> > +                      * We can just return here since there will always be
> > +                      * only one ARG_PTR_TO_MEM in fn.
> > +                      */
> > +                     return 0;
>
> I think the assumption here that NO_OBJ_REF flag reduces ARG_PTR_TO_MEM
> to be stack, a pointer to packet or map value, right?
> Since dynptr can only be on stack, map value and packet memory
> cannot be used to store dynptr.
> So bpf_dynptr_alloc(16, 0, &ptr); is not possible where &ptr
> points to packet or map value?
> So that's what 'return 0' above doing?
> That's probably ok.
>
> Just thinking out loud:
> bpf_dynptr_from_mem(&ptr, sizeof(ptr), 0, &local);
> where &local is a dynptr on stack, but &ptr is a map value?
> The lifetime of the memory tracked by dynptr is not going
> to outlive program execution.
> Probably ok too.
>
After our conversation, I will remove local dynptrs for now.
> > +             } else if (arg_type_is_mem_size(arg_type)) {
> > +                     mem_size = roundup(reg->var_off.value, BPF_REG_SIZE);
> > +             }
> > +     }
> > +
> > +     if (!mem_reg || !mem_size) {
> > +             verbose(env, "verifier internal error: invalid ARG_PTR_TO_MEM args for %s\n", __func__);
> > +             return -EFAULT;
> > +     }
> > +
> > +     spi = get_spi(mem_reg->off);
> > +     if (!is_spi_bounds_valid(state, spi, mem_size)) {
> > +             verbose(env, "verifier internal error: variable not initialized on stack in %s\n", __func__);
> > +             return -EFAULT;
> > +     }
> > +
> > +     nr_slots = mem_size / BPF_REG_SIZE;
> > +     for (i = 0; i < nr_slots; i++)
> > +             state->stack[spi - i].spilled_ptr.is_dynptr_data = true;
>
> So the stack is still STACK_INVALID potentially,
> but we mark it as is_dynptr_data...
> but the data doesn't need to be 8-byte (spill_ptr) aligned.
> So the above loop will mark more stack slots as busy then
> the actual stack memory the dynptr points to.
> Probably ok.
> The stack size is just 512. I wonder whether all this complexity
> with tracking special stack memory is worth it.
> May be restrict dynptr_from_mem to point to non-stack PTR_TO_MEM only?
>
> > +
> > +     return 0;
> > +}
> > +
> > +static bool is_dynptr_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> > +                                    bool *is_dynptr_data)
> > +{
> > +     struct bpf_func_state *state = func(env, reg);
> > +     int spi;
> > +
> > +     spi = get_spi(reg->off);
> > +
> > +     if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS))
> > +             return true;
> > +
> > +     if (state->stack[spi].slot_type[0] == STACK_DYNPTR ||
> > +         state->stack[spi - 1].slot_type[0] == STACK_DYNPTR)
> > +             return false;
> > +
> > +     if (state->stack[spi].spilled_ptr.is_dynptr_data ||
> > +         state->stack[spi - 1].spilled_ptr.is_dynptr_data) {
> > +             *is_dynptr_data = true;
> > +             return false;
> > +     }
> > +
> > +     return true;
> > +}
> > +
> > +static bool is_dynptr_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> > +                                  enum bpf_arg_type arg_type)
> > +{
> > +     struct bpf_func_state *state = func(env, reg);
> > +     int spi = get_spi(reg->off);
> > +
> > +     if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS) ||
> > +         state->stack[spi].slot_type[0] != STACK_DYNPTR ||
> > +         state->stack[spi - 1].slot_type[0] != STACK_DYNPTR ||
> > +         !state->stack[spi].spilled_ptr.dynptr.first_slot)
> > +             return false;
> > +
> > +     /* ARG_PTR_TO_DYNPTR takes any type of dynptr */
> > +     if (arg_type == ARG_PTR_TO_DYNPTR)
> > +             return true;
> > +
> > +     return state->stack[spi].spilled_ptr.dynptr.type == arg_to_dynptr_type(arg_type);
> > +}
> > +
> > +static bool stack_access_into_dynptr(struct bpf_func_state *state, int spi, int size)
> > +{
> > +     int nr_slots = roundup(size, BPF_REG_SIZE) / BPF_REG_SIZE;
> > +     int i;
> > +
> > +     for (i = 0; i < nr_slots; i++) {
> > +             if (state->stack[spi - i].slot_type[0] == STACK_DYNPTR)
> > +                     return true;
> > +     }
> > +
> > +     return false;
> > +}
> > +
> >  /* The reg state of a pointer or a bounded scalar was saved when
> >   * it was spilled to the stack.
> >   */
> > @@ -2878,6 +3088,12 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
> >       }
> >
> >       mark_stack_slot_scratched(env, spi);
> > +
> > +     if (stack_access_into_dynptr(state, spi, size)) {
> > +             verbose(env, "direct write into dynptr is not permitted\n");
> > +             return -EINVAL;
> > +     }
> > +
> >       if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) &&
> >           !register_is_null(reg) && env->bpf_capable) {
> >               if (dst_reg != BPF_REG_FP) {
> > @@ -2999,6 +3215,12 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env,
> >               slot = -i - 1;
> >               spi = slot / BPF_REG_SIZE;
> >               stype = &state->stack[spi].slot_type[slot % BPF_REG_SIZE];
> > +
> > +             if (*stype == STACK_DYNPTR) {
> > +                     verbose(env, "direct write into dynptr is not permitted\n");
> > +                     return -EINVAL;
> > +             }
> > +
> >               mark_stack_slot_scratched(env, spi);
> >
> >               if (!env->allow_ptr_leaks
> > @@ -5141,6 +5363,16 @@ static bool arg_type_is_int_ptr(enum bpf_arg_type type)
> >              type == ARG_PTR_TO_LONG;
> >  }
> >
> > +static inline bool arg_type_is_dynptr(enum bpf_arg_type type)
> > +{
> > +     return base_type(type) == ARG_PTR_TO_DYNPTR;
> > +}
> > +
> > +static inline bool arg_type_is_dynptr_uninit(enum bpf_arg_type type)
> > +{
> > +     return arg_type_is_dynptr(type) && (type & MEM_UNINIT);
> > +}
> > +
> >  static int int_ptr_type_to_size(enum bpf_arg_type type)
> >  {
> >       if (type == ARG_PTR_TO_INT)
> > @@ -5278,6 +5510,7 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
> >       [ARG_PTR_TO_STACK]              = &stack_ptr_types,
> >       [ARG_PTR_TO_CONST_STR]          = &const_str_ptr_types,
> >       [ARG_PTR_TO_TIMER]              = &timer_types,
> > +     [ARG_PTR_TO_DYNPTR]             = &stack_ptr_types,
> >  };
> >
> >  static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
> > @@ -5450,10 +5683,16 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
> >               return err;
> >
> >  skip_type_check:
> > -     /* check_func_arg_reg_off relies on only one referenced register being
> > -      * allowed for BPF helpers.
> > -      */
> >       if (reg->ref_obj_id) {
> > +             if (arg_type & NO_OBJ_REF) {
> > +                     verbose(env, "Arg #%d cannot be a referenced object\n",
> > +                             arg + 1);
> > +                     return -EINVAL;
> > +             }
> > +
> > +             /* check_func_arg_reg_off relies on only one referenced register being
> > +              * allowed for BPF helpers.
> > +              */
> >               if (meta->ref_obj_id) {
> >                       verbose(env, "verifier internal error: more than one arg with ref_obj_id R%d %u %u\n",
> >                               regno, reg->ref_obj_id,
> > @@ -5463,16 +5702,26 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
> >               meta->ref_obj_id = reg->ref_obj_id;
> >       }
> >       if (arg_type & OBJ_RELEASE) {
> > -             if (!reg->ref_obj_id) {
> > +             if (arg_type_is_dynptr(arg_type)) {
> > +                     struct bpf_func_state *state = func(env, reg);
> > +                     int spi = get_spi(reg->off);
> > +
> > +                     if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS) ||
> > +                         !state->stack[spi].spilled_ptr.id) {
> > +                             verbose(env, "arg %d is an unacquired reference\n", regno);
> > +                             return -EINVAL;
> > +                     }
> > +                     meta->release_dynptr = true;
> > +             } else if (!reg->ref_obj_id) {
> >                       verbose(env, "arg %d is an unacquired reference\n", regno);
> >                       return -EINVAL;
> >               }
> > -             if (meta->release_ref) {
> > -                     verbose(env, "verifier internal error: more than one release_ref arg R%d\n",
> > -                             regno);
> > +             if (meta->release_regno) {
> > +                     verbose(env, "verifier internal error: more than one release_regno %u %u\n",
> > +                             meta->release_regno, regno);
> >                       return -EFAULT;
> >               }
> > -             meta->release_ref = true;
> > +             meta->release_regno = regno;
> >       }
> >
> >       if (arg_type == ARG_CONST_MAP_PTR) {
> > @@ -5565,6 +5814,44 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
> >               bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
> >
> >               err = check_mem_size_reg(env, reg, regno, zero_size_allowed, meta);
> > +     } else if (arg_type_is_dynptr(arg_type)) {
> > +             /* Can't pass in a dynptr at a weird offset */
> > +             if (reg->off % BPF_REG_SIZE) {
> > +                     verbose(env, "cannot pass in non-zero dynptr offset\n");
> > +                     return -EINVAL;
> > +             }
> > +
> > +             if (arg_type & MEM_UNINIT)  {
> > +                     bool is_dynptr_data = false;
> > +
> > +                     if (!is_dynptr_reg_valid_uninit(env, reg, &is_dynptr_data)) {
> > +                             if (is_dynptr_data)
> > +                                     verbose(env, "Arg #%d cannot be a memory reference for another dynptr\n",
> > +                                             arg + 1);
> > +                             else
> > +                                     verbose(env, "Arg #%d dynptr has to be an uninitialized dynptr\n",
> > +                                             arg + 1);
> > +                             return -EINVAL;
> > +                     }
> > +
> > +                     meta->uninit_dynptr_regno = arg + BPF_REG_1;
> > +             } else if (!is_dynptr_reg_valid_init(env, reg, arg_type)) {
> > +                     const char *err_extra = "";
> > +
> > +                     switch (arg_type & DYNPTR_TYPE_FLAG_MASK) {
> > +                     case DYNPTR_TYPE_LOCAL:
> > +                             err_extra = "local ";
> > +                             break;
> > +                     case DYNPTR_TYPE_MALLOC:
> > +                             err_extra = "malloc ";
> > +                             break;
> > +                     default:
> > +                             break;
> > +                     }
> > +                     verbose(env, "Expected an initialized %sdynptr as arg #%d\n",
> > +                             err_extra, arg + 1);
> > +                     return -EINVAL;
> > +             }
> >       } else if (arg_type_is_alloc_size(arg_type)) {
> >               if (!tnum_is_const(reg->var_off)) {
> >                       verbose(env, "R%d is not a known constant'\n",
> > @@ -6545,6 +6832,28 @@ static int check_reference_leak(struct bpf_verifier_env *env)
> >       return state->acquired_refs ? -EINVAL : 0;
> >  }
> >
> > +/* Called at BPF_EXIT to detect if there are any reference-tracked dynptrs that have
> > + * not been released. Dynptrs to local memory do not need to be released.
> > + */
> > +static int check_dynptr_unreleased(struct bpf_verifier_env *env)
> > +{
> > +     struct bpf_func_state *state = cur_func(env);
> > +     int allocated_slots, i;
> > +
> > +     allocated_slots = state->allocated_stack / BPF_REG_SIZE;
> > +
> > +     for (i = 0; i < allocated_slots; i++) {
> > +             if (state->stack[i].slot_type[0] == STACK_DYNPTR) {
> > +                     if (dynptr_type_refcounted(state->stack[i].spilled_ptr.dynptr.type)) {
> > +                             verbose(env, "spi=%d is an unreleased dynptr\n", i);
> > +                             return -EINVAL;
> > +                     }
> > +             }
> > +     }
>
> I guess it's ok to treat refcnted dynptr special like above.
> I wonder whether we can reuse check_reference_leak logic?
I like this idea! My reason for not storing dynptr reference ids in
state->refs was because it's costly (eg we realloc_array every time we
acquire a reference). But thinking about this some more, I like the
idea of keeping everything unified by having all reference ids reside
within state->refs and checking for leaks the same way. Perhaps we can
optimize acquire_reference_state() as well where we upfront allocate
more space for state->refs instead of having to do a realloc_array
every time.

>
> > +
> > +     return 0;
> > +}
> > +
> >  static int check_bpf_snprintf_call(struct bpf_verifier_env *env,
> >                                  struct bpf_reg_state *regs)
> >  {
> > @@ -6686,8 +6995,38 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
> >                       return err;
> >       }
> >
> > -     if (meta.release_ref) {
> > -             err = release_reference(env, meta.ref_obj_id);
> > +     regs = cur_regs(env);
> > +
> > +     if (meta.uninit_dynptr_regno) {
> > +             enum bpf_arg_type type;
> > +
> > +             /* we write BPF_W bits (4 bytes) at a time */
> > +             for (i = 0; i < BPF_DYNPTR_SIZE; i += 4) {
> > +                     err = check_mem_access(env, insn_idx, meta.uninit_dynptr_regno,
> > +                                            i, BPF_W, BPF_WRITE, -1, false);
>
> Why 4 byte at a time?
> dynptr has an 8 byte pointer in there.
Oh I see. I thought BPF_W was the largest BPF_SIZE but I see now there
is also BPF_DW which is 64-bit. I'll change this to BPF_DW.
>

[...]
> > 2.30.2
> >

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v2 3/7] bpf: Add bpf_dynptr_from_mem, bpf_dynptr_alloc, bpf_dynptr_put
  2022-04-26 23:45     ` Joanne Koong
@ 2022-04-27  1:26       ` Alexei Starovoitov
  2022-04-27  3:53         ` Andrii Nakryiko
  2022-04-27  3:48       ` Andrii Nakryiko
  1 sibling, 1 reply; 27+ messages in thread
From: Alexei Starovoitov @ 2022-04-27  1:26 UTC (permalink / raw)
  To: Joanne Koong
  Cc: bpf, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Tue, Apr 26, 2022 at 4:45 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > I guess it's ok to treat refcnted dynptr special like above.
> > I wonder whether we can reuse check_reference_leak logic?
> I like this idea! My reason for not storing dynptr reference ids in
> state->refs was because it's costly (eg we realloc_array every time we
> acquire a reference). But thinking about this some more, I like the
> idea of keeping everything unified by having all reference ids reside
> within state->refs and checking for leaks the same way. Perhaps we can
> optimize acquire_reference_state() as well where we upfront allocate
> more space for state->refs instead of having to do a realloc_array
> every time.

realloc is decently efficient underneath.
Probably not worth micro optimizing for it.
As far as ref state... Looks like dynptr patch is trying
hard to prevent writes into the stack area where dynptr
was allocated. Then cleans it up after dynptr_put.
For other pointers on stack we just mark the area as stack_misc
only when the stack slot was overwritten.
We don't mark the slot as 'misc' after the pointer was read from stack.
We can use the same approach with dynptr as long as dynptr
leaking is tracking through ref state
(instead of for(each stack slot) at the time of bpf_exit)

iirc we've debugged the case where clang reused stack area
with a scalar that was previously used for stack spill.
The dynptr on stack won't be seen as stack spill from compiler pov
but I worry about the case:
struct bpf_dynptr t;
bpf_dynptr_alloc(&t,..);
bpf_dynptr_put(&t);
// compiler thinks the stack area of 't' is dead and reuses
// it for something like scalar.
Even without dynptr_put above the compiler might
see that dynptr_alloc or another function stored
something into dynptr, but if nothing is using that
dynptr later it might consider the stack area as dead.
We cannot mark every dynptr variable as volatile.

Another point to consider...
This patch unconditionally tells the verifier to
unmark_stack_slots_dynptr() after bpf_dynptr_put().
But that's valid only for refcnt=1 -> 0 transition.
I'm not sure that will be forever the case even
for dynptr-s on stack.
If we allows refcnt=2,3,... on stack then
the verifier won't be able to clear stack slots
after bpf_dynptr_put and we will face the stack reuse issue.
I guess the idea is that refcnt-ed dynptr will be only in a map?
That might be inconvenient.
We allow refcnt-ed kptrs to be in a map, in a register,
and spilled to the stack.
Surely, dynptr are more complex in that sense.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v2 3/7] bpf: Add bpf_dynptr_from_mem, bpf_dynptr_alloc, bpf_dynptr_put
  2022-04-26 23:45     ` Joanne Koong
  2022-04-27  1:26       ` Alexei Starovoitov
@ 2022-04-27  3:48       ` Andrii Nakryiko
  1 sibling, 0 replies; 27+ messages in thread
From: Andrii Nakryiko @ 2022-04-27  3:48 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Alexei Starovoitov, bpf, Andrii Nakryiko,
	Kumar Kartikeya Dwivedi, Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Tue, Apr 26, 2022 at 4:45 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Thu, Apr 21, 2022 at 7:52 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Fri, Apr 15, 2022 at 11:34:25PM -0700, Joanne Koong wrote:
> > > This patch adds 3 new APIs and the bulk of the verifier work for
> > > supporting dynamic pointers in bpf.
> > >
> > > There are different types of dynptrs. This patch starts with the most
> > > basic ones, ones that reference a program's local memory
> > > (eg a stack variable) and ones that reference memory that is dynamically
> > > allocated on behalf of the program. If the memory is dynamically
> > > allocated by the program, the program *must* free it before the program
> > > exits. This is enforced by the verifier.
> > >
> > > The added APIs are:
> > >
> > > long bpf_dynptr_from_mem(void *data, u32 size, u64 flags, struct bpf_dynptr *ptr);
> > > long bpf_dynptr_alloc(u32 size, u64 flags, struct bpf_dynptr *ptr);
> > > void bpf_dynptr_put(struct bpf_dynptr *ptr);
> > >
> > > This patch sets up the verifier to support dynptrs. Dynptrs will always
> > > reside on the program's stack frame. As such, their state is tracked
> > > in their corresponding stack slot, which includes the type of dynptr
> > > (DYNPTR_LOCAL vs. DYNPTR_MALLOC).
> > >
> > > When the program passes in an uninitialized dynptr (ARG_PTR_TO_DYNPTR |
> > > MEM_UNINIT), the stack slots corresponding to the frame pointer
> > > where the dynptr resides at are marked as STACK_DYNPTR. For helper functions
> > > that take in initialized dynptrs (such as the next patch in this series
> > > which supports dynptr reads/writes), the verifier enforces that the
> > > dynptr has been initialized by checking that their corresponding stack
> > > slots have been marked as STACK_DYNPTR. Dynptr release functions
> > > (eg bpf_dynptr_put) will clear the stack slots. The verifier enforces at
> > > program exit that there are no acquired dynptr stack slots that need
> > > to be released.
> > >
> > > There are other constraints that are enforced by the verifier as
> > > well, such as that the dynptr cannot be written to directly by the bpf
> > > program or by non-dynptr helper functions. The last patch in this series
> > > contains tests that trigger different cases that the verifier needs to
> > > successfully reject.
> > >
> > > For now, local dynptrs cannot point to referenced memory since the
> > > memory can be freed anytime. Support for this will be added as part
> > > of a separate patchset.
> > >
> > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > > ---
> > >  include/linux/bpf.h            |  68 +++++-
> > >  include/linux/bpf_verifier.h   |  28 +++
> > >  include/uapi/linux/bpf.h       |  44 ++++
> > >  kernel/bpf/helpers.c           | 110 ++++++++++
> > >  kernel/bpf/verifier.c          | 372 +++++++++++++++++++++++++++++++--
> > >  scripts/bpf_doc.py             |   2 +
> > >  tools/include/uapi/linux/bpf.h |  44 ++++
> > >  7 files changed, 654 insertions(+), 14 deletions(-)
> > >
> [...]
> > > +     for (i = 0; i < BPF_REG_SIZE; i++) {
> > > +             state->stack[spi].slot_type[i] = STACK_INVALID;
> > > +             state->stack[spi - 1].slot_type[i] = STACK_INVALID;
> > > +     }
> > > +
> > > +     state->stack[spi].spilled_ptr.dynptr.type = 0;
> > > +     state->stack[spi - 1].spilled_ptr.dynptr.type = 0;
> > > +
> > > +     state->stack[spi].spilled_ptr.dynptr.first_slot = 0;
> > > +
> > > +     return 0;
> > > +}
> > > +
> > > +static int mark_as_dynptr_data(struct bpf_verifier_env *env, const struct bpf_func_proto *fn,
> > > +                            struct bpf_reg_state *regs)
> > > +{
> > > +     struct bpf_func_state *state = cur_func(env);
> > > +     struct bpf_reg_state *reg, *mem_reg = NULL;
> > > +     enum bpf_arg_type arg_type;
> > > +     u64 mem_size;
> > > +     u32 nr_slots;
> > > +     int i, spi;
> > > +
> > > +     /* We must protect against the case where a program tries to do something
> > > +      * like this:
> > > +      *
> > > +      * bpf_dynptr_from_mem(&ptr, sizeof(ptr), 0, &local);
> > > +      * bpf_dynptr_alloc(16, 0, &ptr);
> > > +      * bpf_dynptr_write(&local, 0, corrupt_data, sizeof(ptr));
> > > +      *
> > > +      * If ptr is a variable on the stack, we must mark the stack slot as
> > > +      * dynptr data when a local dynptr to it is created.
> > > +      */
> > > +     for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
> > > +             arg_type = fn->arg_type[i];
> > > +             reg = &regs[BPF_REG_1 + i];
> > > +
> > > +             if (base_type(arg_type) == ARG_PTR_TO_MEM) {
> > > +                     if (base_type(reg->type) == PTR_TO_STACK) {
> > > +                             mem_reg = reg;
> > > +                             continue;
> > > +                     }
> > > +                     /* if it's not a PTR_TO_STACK, then we don't need to
> > > +                      * mark anything since it can never be used as a dynptr.
> > > +                      * We can just return here since there will always be
> > > +                      * only one ARG_PTR_TO_MEM in fn.
> > > +                      */
> > > +                     return 0;
> >
> > I think the assumption here that NO_OBJ_REF flag reduces ARG_PTR_TO_MEM
> > to be stack, a pointer to packet or map value, right?
> > Since dynptr can only be on stack, map value and packet memory
> > cannot be used to store dynptr.
> > So bpf_dynptr_alloc(16, 0, &ptr); is not possible where &ptr
> > points to packet or map value?
> > So that's what 'return 0' above doing?
> > That's probably ok.
> >
> > Just thinking out loud:
> > bpf_dynptr_from_mem(&ptr, sizeof(ptr), 0, &local);
> > where &local is a dynptr on stack, but &ptr is a map value?
> > The lifetime of the memory tracked by dynptr is not going
> > to outlive program execution.
> > Probably ok too.
> >
> After our conversation, I will remove local dynptrs for now.


bpf_dynptr_from_mem(&ptr, sizeof(ptr), 0, &local) where ptr is
PTR_TO_MAP_VALUE is still ok. So it's only a special case of ptr being
PTR_TO_STACK that will be disallowed, right? It's still LOCAL type of
dynptr, it just can't point to memory on the stack.

> > > +             } else if (arg_type_is_mem_size(arg_type)) {
> > > +                     mem_size = roundup(reg->var_off.value, BPF_REG_SIZE);
> > > +             }
> > > +     }
> > > +
> > > +     if (!mem_reg || !mem_size) {
> > > +             verbose(env, "verifier internal error: invalid ARG_PTR_TO_MEM args for %s\n", __func__);
> > > +             return -EFAULT;
> > > +     }
> > > +
> > > +     spi = get_spi(mem_reg->off);
> > > +     if (!is_spi_bounds_valid(state, spi, mem_size)) {
> > > +             verbose(env, "verifier internal error: variable not initialized on stack in %s\n", __func__);
> > > +             return -EFAULT;
> > > +     }
> > > +
> > > +     nr_slots = mem_size / BPF_REG_SIZE;
> > > +     for (i = 0; i < nr_slots; i++)
> > > +             state->stack[spi - i].spilled_ptr.is_dynptr_data = true;
> >

[...]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v2 3/7] bpf: Add bpf_dynptr_from_mem, bpf_dynptr_alloc, bpf_dynptr_put
  2022-04-27  1:26       ` Alexei Starovoitov
@ 2022-04-27  3:53         ` Andrii Nakryiko
  2022-04-27 23:27           ` Joanne Koong
  0 siblings, 1 reply; 27+ messages in thread
From: Andrii Nakryiko @ 2022-04-27  3:53 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Joanne Koong, bpf, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Tue, Apr 26, 2022 at 6:26 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Apr 26, 2022 at 4:45 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > >
> > > I guess it's ok to treat refcnted dynptr special like above.
> > > I wonder whether we can reuse check_reference_leak logic?
> > I like this idea! My reason for not storing dynptr reference ids in
> > state->refs was because it's costly (eg we realloc_array every time we
> > acquire a reference). But thinking about this some more, I like the
> > idea of keeping everything unified by having all reference ids reside
> > within state->refs and checking for leaks the same way. Perhaps we can
> > optimize acquire_reference_state() as well where we upfront allocate
> > more space for state->refs instead of having to do a realloc_array
> > every time.
>
> realloc is decently efficient underneath.
> Probably not worth micro optimizing for it.
> As far as ref state... Looks like dynptr patch is trying
> hard to prevent writes into the stack area where dynptr
> was allocated. Then cleans it up after dynptr_put.
> For other pointers on stack we just mark the area as stack_misc
> only when the stack slot was overwritten.
> We don't mark the slot as 'misc' after the pointer was read from stack.
> We can use the same approach with dynptr as long as dynptr
> leaking is tracking through ref state
> (instead of for(each stack slot) at the time of bpf_exit)
>
> iirc we've debugged the case where clang reused stack area
> with a scalar that was previously used for stack spill.
> The dynptr on stack won't be seen as stack spill from compiler pov
> but I worry about the case:
> struct bpf_dynptr t;
> bpf_dynptr_alloc(&t,..);
> bpf_dynptr_put(&t);
> // compiler thinks the stack area of 't' is dead and reuses
> // it for something like scalar.
> Even without dynptr_put above the compiler might
> see that dynptr_alloc or another function stored
> something into dynptr, but if nothing is using that
> dynptr later it might consider the stack area as dead.
> We cannot mark every dynptr variable as volatile.
>
> Another point to consider...
> This patch unconditionally tells the verifier to
> unmark_stack_slots_dynptr() after bpf_dynptr_put().
> But that's valid only for refcnt=1 -> 0 transition.
> I'm not sure that will be forever the case even
> for dynptr-s on stack.
> If we allows refcnt=2,3,... on stack then
> the verifier won't be able to clear stack slots
> after bpf_dynptr_put and we will face the stack reuse issue.
> I guess the idea is that refcnt-ed dynptr will be only in a map?
> That might be inconvenient.
> We allow refcnt-ed kptrs to be in a map, in a register,
> and spilled to the stack.
> Surely, dynptr are more complex in that sense.

struct dynptr on the stack isn't by itself refcounted. E.g., if we
have dynptr pointing to PTR_TO_MAP_VALUE there is no refcounting
involved and we don't have to do bpf_dynptr_put(). The really
refcounted case is malloc()'ed memory pointed to by dynptr. But in
this case refcount is stored next to the actual memory, not inside
struct bpf_dynptr. So when we do bpf_dynptr_put() on local struct
dynptr copy, it decrements refcount of malloc()'ed memory. If it was
the last refcnt, then memory is freed. But we can still have other
copies (e.g., in another on-the-stack struct bpf_dynptr copy of BPF
program that runs on another CPU, or inside the map value) which will
keep allocated memory.

bpf_dynptr_put() are just saying "we are done with our local instance
of struct bpf_dynptr and that slot can be reused for something else".
So Clang deciding to reuse that stack slot for something unrelated
after bpf_dynptr_put() should be fine.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v2 3/7] bpf: Add bpf_dynptr_from_mem, bpf_dynptr_alloc, bpf_dynptr_put
  2022-04-27  3:53         ` Andrii Nakryiko
@ 2022-04-27 23:27           ` Joanne Koong
  2022-04-28  1:37             ` Alexei Starovoitov
  0 siblings, 1 reply; 27+ messages in thread
From: Joanne Koong @ 2022-04-27 23:27 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, bpf, Andrii Nakryiko,
	Kumar Kartikeya Dwivedi, Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Tue, Apr 26, 2022 at 8:53 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Tue, Apr 26, 2022 at 6:26 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Apr 26, 2022 at 4:45 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > >
> > > > I guess it's ok to treat refcnted dynptr special like above.
> > > > I wonder whether we can reuse check_reference_leak logic?
> > > I like this idea! My reason for not storing dynptr reference ids in
> > > state->refs was because it's costly (eg we realloc_array every time we
> > > acquire a reference). But thinking about this some more, I like the
> > > idea of keeping everything unified by having all reference ids reside
> > > within state->refs and checking for leaks the same way. Perhaps we can
> > > optimize acquire_reference_state() as well where we upfront allocate
> > > more space for state->refs instead of having to do a realloc_array
> > > every time.
> >
> > realloc is decently efficient underneath.
> > Probably not worth micro optimizing for it.
> > As far as ref state... Looks like dynptr patch is trying
> > hard to prevent writes into the stack area where dynptr
> > was allocated. Then cleans it up after dynptr_put.
> > For other pointers on stack we just mark the area as stack_misc
> > only when the stack slot was overwritten.
> > We don't mark the slot as 'misc' after the pointer was read from stack.
> > We can use the same approach with dynptr as long as dynptr
> > leaking is tracking through ref state
> > (instead of for(each stack slot) at the time of bpf_exit)
I think the trade-off with this is that the verifier error message
will be more ambiguous (eg if you try to call bpf_dynptr_put, the
message would be something like "arg 1 is an unacquired reference" vs.
a more clear-cut message like "direct write into dynptr is not
permitted" at the erring instruction). But I think that's fine. I will
change it to mark the slot as misc for v3.
> >
> > iirc we've debugged the case where clang reused stack area
> > with a scalar that was previously used for stack spill.
> > The dynptr on stack won't be seen as stack spill from compiler pov
> > but I worry about the case:
> > struct bpf_dynptr t;
> > bpf_dynptr_alloc(&t,..);
> > bpf_dynptr_put(&t);
> > // compiler thinks the stack area of 't' is dead and reuses
> > // it for something like scalar.
> > Even without dynptr_put above the compiler might
> > see that dynptr_alloc or another function stored
> > something into dynptr, but if nothing is using that
> > dynptr later it might consider the stack area as dead.
> > We cannot mark every dynptr variable as volatile.
> >
> > Another point to consider...
> > This patch unconditionally tells the verifier to
> > unmark_stack_slots_dynptr() after bpf_dynptr_put().
> > But that's valid only for refcnt=1 -> 0 transition.
> > I'm not sure that will be forever the case even
> > for dynptr-s on stack.
> > If we allows refcnt=2,3,... on stack then
> > the verifier won't be able to clear stack slots
> > after bpf_dynptr_put and we will face the stack reuse issue.
> > I guess the idea is that refcnt-ed dynptr will be only in a map?
> > That might be inconvenient.
> > We allow refcnt-ed kptrs to be in a map, in a register,
> > and spilled to the stack.
> > Surely, dynptr are more complex in that sense.
>
> struct dynptr on the stack isn't by itself refcounted. E.g., if we
> have dynptr pointing to PTR_TO_MAP_VALUE there is no refcounting
> involved and we don't have to do bpf_dynptr_put(). The really
> refcounted case is malloc()'ed memory pointed to by dynptr. But in
> this case refcount is stored next to the actual memory, not inside
> struct bpf_dynptr. So when we do bpf_dynptr_put() on local struct
> dynptr copy, it decrements refcount of malloc()'ed memory. If it was
> the last refcnt, then memory is freed. But we can still have other
> copies (e.g., in another on-the-stack struct bpf_dynptr copy of BPF
> program that runs on another CPU, or inside the map value) which will
> keep allocated memory.
>
> bpf_dynptr_put() are just saying "we are done with our local instance
> of struct bpf_dynptr and that slot can be reused for something else".
> So Clang deciding to reuse that stack slot for something unrelated
> after bpf_dynptr_put() should be fine.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v2 3/7] bpf: Add bpf_dynptr_from_mem, bpf_dynptr_alloc, bpf_dynptr_put
  2022-04-27 23:27           ` Joanne Koong
@ 2022-04-28  1:37             ` Alexei Starovoitov
  0 siblings, 0 replies; 27+ messages in thread
From: Alexei Starovoitov @ 2022-04-28  1:37 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Andrii Nakryiko, bpf, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Alexei Starovoitov, Daniel Borkmann,
	Toke Høiland-Jørgensen

On Wed, Apr 27, 2022 at 4:28 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Tue, Apr 26, 2022 at 8:53 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Tue, Apr 26, 2022 at 6:26 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Tue, Apr 26, 2022 at 4:45 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > >
> > > > > I guess it's ok to treat refcnted dynptr special like above.
> > > > > I wonder whether we can reuse check_reference_leak logic?
> > > > I like this idea! My reason for not storing dynptr reference ids in
> > > > state->refs was because it's costly (eg we realloc_array every time we
> > > > acquire a reference). But thinking about this some more, I like the
> > > > idea of keeping everything unified by having all reference ids reside
> > > > within state->refs and checking for leaks the same way. Perhaps we can
> > > > optimize acquire_reference_state() as well where we upfront allocate
> > > > more space for state->refs instead of having to do a realloc_array
> > > > every time.
> > >
> > > realloc is decently efficient underneath.
> > > Probably not worth micro optimizing for it.
> > > As far as ref state... Looks like dynptr patch is trying
> > > hard to prevent writes into the stack area where dynptr
> > > was allocated. Then cleans it up after dynptr_put.
> > > For other pointers on stack we just mark the area as stack_misc
> > > only when the stack slot was overwritten.
> > > We don't mark the slot as 'misc' after the pointer was read from stack.
> > > We can use the same approach with dynptr as long as dynptr
> > > leaking is tracking through ref state
> > > (instead of for(each stack slot) at the time of bpf_exit)
> I think the trade-off with this is that the verifier error message
> will be more ambiguous (eg if you try to call bpf_dynptr_put, the
> message would be something like "arg 1 is an unacquired reference" vs.
> a more clear-cut message like "direct write into dynptr is not
> permitted" at the erring instruction). But I think that's fine. I will
> change it to mark the slot as misc for v3.

I'm trying to say that
"direct write into dynptr is not permitted"
could be just as confusing to users,
because the store instruction into that stack slot
was generated by the compiler because of some optimization
and the user has no idea why that code was generated.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2022-04-28  1:37 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-16  6:34 [PATCH bpf-next v2 0/7] Dynamic pointers Joanne Koong
2022-04-16  6:34 ` [PATCH bpf-next v2 1/7] bpf: Add MEM_UNINIT as a bpf_type_flag Joanne Koong
2022-04-19  4:59   ` Alexei Starovoitov
2022-04-19 19:26     ` Joanne Koong
2022-04-16  6:34 ` [PATCH bpf-next v2 2/7] bpf: Add OBJ_RELEASE " Joanne Koong
2022-04-16  6:34 ` [PATCH bpf-next v2 3/7] bpf: Add bpf_dynptr_from_mem, bpf_dynptr_alloc, bpf_dynptr_put Joanne Koong
2022-04-16 17:42   ` Kumar Kartikeya Dwivedi
2022-04-18 22:20     ` Joanne Koong
2022-04-18 23:57       ` Kumar Kartikeya Dwivedi
2022-04-19 19:23         ` Joanne Koong
2022-04-19 20:18           ` Kumar Kartikeya Dwivedi
2022-04-20 21:15             ` Joanne Koong
2022-04-19 20:35   ` Kumar Kartikeya Dwivedi
2022-04-22  2:52   ` Alexei Starovoitov
2022-04-26 23:45     ` Joanne Koong
2022-04-27  1:26       ` Alexei Starovoitov
2022-04-27  3:53         ` Andrii Nakryiko
2022-04-27 23:27           ` Joanne Koong
2022-04-28  1:37             ` Alexei Starovoitov
2022-04-27  3:48       ` Andrii Nakryiko
2022-04-16  6:34 ` [PATCH bpf-next v2 4/7] bpf: Add bpf_dynptr_read and bpf_dynptr_write Joanne Koong
2022-04-16  6:34 ` [PATCH bpf-next v2 5/7] bpf: Add dynptr data slices Joanne Koong
2022-04-16  6:34 ` [PATCH bpf-next v2 6/7] bpf: Dynptr support for ring buffers Joanne Koong
2022-04-16  6:34 ` [PATCH bpf-next v2 7/7] bpf: Dynptr tests Joanne Koong
2022-04-16  8:13 ` [PATCH bpf-next v2 0/7] Dynamic pointers Kumar Kartikeya Dwivedi
2022-04-16  8:19   ` Kumar Kartikeya Dwivedi
2022-04-18 16:40     ` Joanne Koong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).