bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 bpf-next 0/9] bpf_refcount followups (part 1)
@ 2023-06-02  2:26 Dave Marchevsky
  2023-06-02  2:26 ` [PATCH v2 bpf-next 1/9] [DONOTAPPLY] Revert "bpf: Disable bpf_refcount_acquire kfunc calls until race conditions are fixed" Dave Marchevsky
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: Dave Marchevsky @ 2023-06-02  2:26 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, Dave Marchevsky

This series is the first of two (or more) followups to address issues in the
bpf_refcount shared ownership implementation discovered by Kumar.
Specifically, this series addresses the "bpf_refcount_acquire on non-owning ref
in another tree" scenario described in [0], and does _not_ address issues
raised in [1]. Further followups will address the other issues.

The series can be applied without re-enabling bpf_refcount_acquire calls, which
were disabled in commit 7deca5eae833 ("bpf: Disable bpf_refcount_acquire kfunc
calls until race conditions are fixed") until all issues are addressed. Some
extra patches are included so that BPF CI tests will exercise test changes in
the series.

Patch contents:
  * Patch 1 reverts earlier disabling of bpf_refcount_acquire calls
    * Selftest added later in the series need to call bpf_refcount_acquire
    * This patch should not be applied and is included to allow CI to run the
      newly-added test and exercise test changes in patch 6
  * Patches 2 and 3 fix other bugs introduced in bpf_refcount series which were
    discovered while reproducing the main issue this series addresses
  * Patch 4 fixes the bpf_refcount_acquire issue by making it fallible for
    non-owning references
  * Patch 5 allows KF_DESTRUCTIVE kfuncs to be called when spinlock is held
    * This patch, and all patches further in the series, should not be
    applied
  * Patch 6 introduces some destructive bpf_testmod kfuncs which the selftest
    added later in the series needs
  * Patch 7 adds a selftest which uses the kfuncs introduced in patch 5 to
    replicate the exact scenario raised by Kumar
  * Patch 8 disables the test added in patch 7
    * This is so the series (aside from DONOTAPPLY patches) can be applied
      without re-enabling bpf_refcount_acquire yet.
  * Patch 9 reverts patch 8 so that CI can run the newly-added test

The first and last patches in the series are included to allow the CI to run
newly-added tests and should not be applied. First patch reverts earlier
disabling of bpf_refcount_acquire calls as the test reproducing
"bpf_refcount_acquire on non-owning ref in another tree" scenario obviously
needs to be able to call bpf_refcount_acquire.

While reproducing the scenario Kumar described in [0], which should cause a
refcount use-after-free, two unrelated bugs were found and are fixed by this
series.

  [0]: https://lore.kernel.org/bpf/atfviesiidev4hu53hzravmtlau3wdodm2vqs7rd7tnwft34e3@xktodqeqevir/
  [1]: https://lore.kernel.org/bpf/d7hyspcow5wtjcmw4fugdgyp3fwhljwuscp3xyut5qnwivyeru@ysdq543otzv2/

Changelog:

v1 -> v2: lore.kernel.org/bpf/20230504053338.1778690-1-davemarchevsky@fb.com

Patch #s used below refer to the patch's position in v1 unless otherwise
specified.

  * General
    * Rebase onto latest bpf-next

    * All bpf_testmod-related changes (destructive kfuncs, etc) have been
      marked [DONOTAPPLY] in response to Alexei's comments in v1 and followup
      conversations offline. They're still included as part of the series so
      that CI can exercise the fixed functionality.
      * v1's Patch 5 - "selftests/bpf: Add unsafe lock/unlock and refcount_read
        kfuncs to bpf_testmod" - was moved after the patches which are meant to
        be applied to make it more obvious that it shouldn't be applied.

    * 4d585f48ee6b ("bpf: Remove anonymous union in bpf_kfunc_call_arg_meta")
      was shipped separately from this series in response to Alexei's comments
      about the anonymous union in kfunc_call_arg_meta. That patch removes the
      anonymous union and its members (arg_obj_drop, etc) in favor of the
      simpler approach suggested by Alexei in v1. This series'
      kfunc_call_arg_meta changes are modified to follow the new pattern.

Dave Marchevsky (9):
  [DONOTAPPLY] Revert "bpf: Disable bpf_refcount_acquire kfunc calls
    until race conditions are fixed"
  bpf: Set kptr_struct_meta for node param to list and rbtree insert
    funcs
  bpf: Fix __bpf_{list,rbtree}_add's beginning-of-node calculation
  bpf: Make bpf_refcount_acquire fallible for non-owning refs
  [DONOTAPPLY] bpf: Allow KF_DESTRUCTIVE-flagged kfuncs to be called
    under spinlock
  [DONOTAPPLY] selftests/bpf: Add unsafe lock/unlock and refcount_read
    kfuncs to bpf_testmod
  [DONOTAPPLY] selftests/bpf: Add test exercising bpf_refcount_acquire
    race condition
  [DONOTAPPLY] selftests/bpf: Disable newly-added refcounted_kptr_races
    test
  [DONOTAPPLY] Revert "selftests/bpf: Disable newly-added
    refcounted_kptr_races test"

 kernel/bpf/helpers.c                          |  12 +-
 kernel/bpf/verifier.c                         |  55 ++++--
 .../selftests/bpf/bpf_testmod/bpf_testmod.c   |  61 +++++++
 .../bpf/prog_tests/refcounted_kptr.c          | 106 +++++++++++-
 .../selftests/bpf/progs/refcounted_kptr.c     | 160 ++++++++++++++++++
 .../bpf/progs/refcounted_kptr_fail.c          |   4 +-
 6 files changed, 379 insertions(+), 19 deletions(-)

-- 
2.34.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 bpf-next 1/9] [DONOTAPPLY] Revert "bpf: Disable bpf_refcount_acquire kfunc calls until race conditions are fixed"
  2023-06-02  2:26 [PATCH v2 bpf-next 0/9] bpf_refcount followups (part 1) Dave Marchevsky
@ 2023-06-02  2:26 ` Dave Marchevsky
  2023-06-02  2:26 ` [PATCH v2 bpf-next 2/9] bpf: Set kptr_struct_meta for node param to list and rbtree insert funcs Dave Marchevsky
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Dave Marchevsky @ 2023-06-02  2:26 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, Dave Marchevsky

This patch should not be applied with the rest of the series as the
series doesn't fix all bpf_refcount issues. It is included here to allow
CI to run the test added later in the series.

Followup series which fixes the remaining problems will include a revert
that should be applied.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/verifier.c                                    | 4 ----
 tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c | 2 ++
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 086b2a14905b..0f04e7fe4843 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10893,10 +10893,6 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				verbose(env, "arg#%d doesn't point to a type with bpf_refcount field\n", i);
 				return -EINVAL;
 			}
-			if (rec->refcount_off >= 0) {
-				verbose(env, "bpf_refcount_acquire calls are disabled for now\n");
-				return -EINVAL;
-			}
 			meta->arg_btf = reg->btf;
 			meta->arg_btf_id = reg->btf_id;
 			break;
diff --git a/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c b/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
index 595cbf92bff5..2ab23832062d 100644
--- a/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
+++ b/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
@@ -9,8 +9,10 @@
 
 void test_refcounted_kptr(void)
 {
+	RUN_TESTS(refcounted_kptr);
 }
 
 void test_refcounted_kptr_fail(void)
 {
+	RUN_TESTS(refcounted_kptr_fail);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 bpf-next 2/9] bpf: Set kptr_struct_meta for node param to list and rbtree insert funcs
  2023-06-02  2:26 [PATCH v2 bpf-next 0/9] bpf_refcount followups (part 1) Dave Marchevsky
  2023-06-02  2:26 ` [PATCH v2 bpf-next 1/9] [DONOTAPPLY] Revert "bpf: Disable bpf_refcount_acquire kfunc calls until race conditions are fixed" Dave Marchevsky
@ 2023-06-02  2:26 ` Dave Marchevsky
  2023-06-02  2:26 ` [PATCH v2 bpf-next 3/9] bpf: Fix __bpf_{list,rbtree}_add's beginning-of-node calculation Dave Marchevsky
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Dave Marchevsky @ 2023-06-02  2:26 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, Dave Marchevsky

In verifier.c, fixup_kfunc_call uses struct bpf_insn_aux_data's
kptr_struct_meta field to pass information about local kptr types to
various helpers and kfuncs at runtime. The recent bpf_refcount series
added a few functions to the set that need this information:

  * bpf_refcount_acquire
    * Needs to know where the refcount field is in order to increment
  * Graph collection insert kfuncs: bpf_rbtree_add, bpf_list_push_{front,back}
    * Were migrated to possibly fail by the bpf_refcount series. If
      insert fails, the input node is bpf_obj_drop'd. bpf_obj_drop needs
      the kptr_struct_meta in order to decr refcount and properly free
      special fields.

Unfortunately the verifier handling of collection insert kfuncs was not
modified to actually populate kptr_struct_meta. Accordingly, when the
node input to those kfuncs is passed to bpf_obj_drop, it is done so
without the information necessary to decr refcount.

This patch fixes the issue by populating kptr_struct_meta for those
kfuncs.

Fixes: d2dcc67df910 ("bpf: Migrate bpf_rbtree_add and bpf_list_push_{front,back} to possibly fail")
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/verifier.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0f04e7fe4843..43d26b65a3ad 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10475,6 +10475,8 @@ __process_kf_arg_ptr_to_graph_node(struct bpf_verifier_env *env,
 			node_off, btf_name_by_offset(reg->btf, t->name_off));
 		return -EINVAL;
 	}
+	meta->arg_btf = reg->btf;
+	meta->arg_btf_id = reg->btf_id;
 
 	if (node_off != field->graph_root.node_offset) {
 		verbose(env, "arg#1 offset=%d, but expected %s at offset=%d in struct %s\n",
@@ -11040,6 +11042,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	    meta.func_id == special_kfunc_list[KF_bpf_rbtree_add_impl]) {
 		release_ref_obj_id = regs[BPF_REG_2].ref_obj_id;
 		insn_aux->insert_off = regs[BPF_REG_2].off;
+		insn_aux->kptr_struct_meta = btf_find_struct_meta(meta.arg_btf, meta.arg_btf_id);
 		err = ref_convert_owning_non_owning(env, release_ref_obj_id);
 		if (err) {
 			verbose(env, "kfunc %s#%d conversion of owning ref to non-owning failed\n",
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 bpf-next 3/9] bpf: Fix __bpf_{list,rbtree}_add's beginning-of-node calculation
  2023-06-02  2:26 [PATCH v2 bpf-next 0/9] bpf_refcount followups (part 1) Dave Marchevsky
  2023-06-02  2:26 ` [PATCH v2 bpf-next 1/9] [DONOTAPPLY] Revert "bpf: Disable bpf_refcount_acquire kfunc calls until race conditions are fixed" Dave Marchevsky
  2023-06-02  2:26 ` [PATCH v2 bpf-next 2/9] bpf: Set kptr_struct_meta for node param to list and rbtree insert funcs Dave Marchevsky
@ 2023-06-02  2:26 ` Dave Marchevsky
  2023-06-02  2:26 ` [PATCH v2 bpf-next 4/9] bpf: Make bpf_refcount_acquire fallible for non-owning refs Dave Marchevsky
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Dave Marchevsky @ 2023-06-02  2:26 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, Dave Marchevsky

Given the pointer to struct bpf_{rb,list}_node within a local kptr and
the byte offset of that field within the kptr struct, the calculation changed
by this patch is meant to find the beginning of the kptr so that it can
be passed to bpf_obj_drop.

Unfortunately instead of doing

  ptr_to_kptr = ptr_to_node_field - offset_bytes

the calculation is erroneously doing

  ptr_to_ktpr = ptr_to_node_field - (offset_bytes * sizeof(struct bpf_rb_node))

or the bpf_list_node equivalent.

This patch fixes the calculation.

Fixes: d2dcc67df910 ("bpf: Migrate bpf_rbtree_add and bpf_list_push_{front,back} to possibly fail")
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/helpers.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 4ef4c4f8a355..a4e437eabcb4 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1950,7 +1950,7 @@ static int __bpf_list_add(struct bpf_list_node *node, struct bpf_list_head *head
 		INIT_LIST_HEAD(h);
 	if (!list_empty(n)) {
 		/* Only called from BPF prog, no need to migrate_disable */
-		__bpf_obj_drop_impl(n - off, rec);
+		__bpf_obj_drop_impl((void *)n - off, rec);
 		return -EINVAL;
 	}
 
@@ -2032,7 +2032,7 @@ static int __bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
 
 	if (!RB_EMPTY_NODE(n)) {
 		/* Only called from BPF prog, no need to migrate_disable */
-		__bpf_obj_drop_impl(n - off, rec);
+		__bpf_obj_drop_impl((void *)n - off, rec);
 		return -EINVAL;
 	}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 bpf-next 4/9] bpf: Make bpf_refcount_acquire fallible for non-owning refs
  2023-06-02  2:26 [PATCH v2 bpf-next 0/9] bpf_refcount followups (part 1) Dave Marchevsky
                   ` (2 preceding siblings ...)
  2023-06-02  2:26 ` [PATCH v2 bpf-next 3/9] bpf: Fix __bpf_{list,rbtree}_add's beginning-of-node calculation Dave Marchevsky
@ 2023-06-02  2:26 ` Dave Marchevsky
  2023-06-02  2:26 ` [PATCH v2 bpf-next 5/9] [DONOTAPPLY] bpf: Allow KF_DESTRUCTIVE-flagged kfuncs to be called under spinlock Dave Marchevsky
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Dave Marchevsky @ 2023-06-02  2:26 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, Dave Marchevsky,
	Kumar Kartikeya Dwivedi

This patch fixes an incorrect assumption made in the original
bpf_refcount series [0], specifically that the BPF program calling
bpf_refcount_acquire on some node can always guarantee that the node is
alive. In that series, the patch adding failure behavior to rbtree_add
and list_push_{front, back} breaks this assumption for non-owning
references.

Consider the following program:

  n = bpf_kptr_xchg(&mapval, NULL);
  /* skip error checking */

  bpf_spin_lock(&l);
  if(bpf_rbtree_add(&t, &n->rb, less)) {
    bpf_refcount_acquire(n);
    /* Failed to add, do something else with the node */
  }
  bpf_spin_unlock(&l);

It's incorrect to assume that bpf_refcount_acquire will always succeed in this
scenario. bpf_refcount_acquire is being called in a critical section
here, but the lock being held is associated with rbtree t, which isn't
necessarily the lock associated with the tree that the node is already
in. So after bpf_rbtree_add fails to add the node and calls bpf_obj_drop
in it, the program has no ownership of the node's lifetime. Therefore
the node's refcount can be decr'd to 0 at any time after the failing
rbtree_add. If this happens before the refcount_acquire above, the node
might be free'd, and regardless refcount_acquire will be incrementing a
0 refcount.

Later patches in the series exercise this scenario, resulting in the
expected complaint from the kernel (without this patch's changes):

  refcount_t: addition on 0; use-after-free.
  WARNING: CPU: 1 PID: 207 at lib/refcount.c:25 refcount_warn_saturate+0xbc/0x110
  Modules linked in: bpf_testmod(O)
  CPU: 1 PID: 207 Comm: test_progs Tainted: G           O       6.3.0-rc7-02231-g723de1a718a2-dirty #371
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
  RIP: 0010:refcount_warn_saturate+0xbc/0x110
  Code: 6f 64 f6 02 01 e8 84 a3 5c ff 0f 0b eb 9d 80 3d 5e 64 f6 02 00 75 94 48 c7 c7 e0 13 d2 82 c6 05 4e 64 f6 02 01 e8 64 a3 5c ff <0f> 0b e9 7a ff ff ff 80 3d 38 64 f6 02 00 0f 85 6d ff ff ff 48 c7
  RSP: 0018:ffff88810b9179b0 EFLAGS: 00010082
  RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000
  RDX: 0000000000000202 RSI: 0000000000000008 RDI: ffffffff857c3680
  RBP: ffff88810027d3c0 R08: ffffffff8125f2a4 R09: ffff88810b9176e7
  R10: ffffed1021722edc R11: 746e756f63666572 R12: ffff88810027d388
  R13: ffff88810027d3c0 R14: ffffc900005fe030 R15: ffffc900005fe048
  FS:  00007fee0584a700(0000) GS:ffff88811b280000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00005634a96f6c58 CR3: 0000000108ce9002 CR4: 0000000000770ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  PKRU: 55555554
  Call Trace:
   <TASK>
   bpf_refcount_acquire_impl+0xb5/0xc0

  (rest of output snipped)

The patch addresses this by changing bpf_refcount_acquire_impl to use
refcount_inc_not_zero instead of refcount_inc and marking
bpf_refcount_acquire KF_RET_NULL.

For owning references, though, we know the above scenario is not possible
and thus that bpf_refcount_acquire will always succeed. Some verifier
bookkeeping is added to track "is input owning ref?" for bpf_refcount_acquire
calls and return false from is_kfunc_ret_null for bpf_refcount_acquire on
owning refs despite it being marked KF_RET_NULL.

Existing selftests using bpf_refcount_acquire are modified where
necessary to NULL-check its return value.

  [0]: https://lore.kernel.org/bpf/20230415201811.343116-1-davemarchevsky@fb.com/

Fixes: d2dcc67df910 ("bpf: Migrate bpf_rbtree_add and bpf_list_push_{front,back} to possibly fail")
Reported-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/helpers.c                          |  8 ++++--
 kernel/bpf/verifier.c                         | 26 +++++++++++++------
 .../selftests/bpf/progs/refcounted_kptr.c     |  2 ++
 .../bpf/progs/refcounted_kptr_fail.c          |  4 ++-
 4 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index a4e437eabcb4..9e80efa59a5d 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1933,8 +1933,12 @@ __bpf_kfunc void *bpf_refcount_acquire_impl(void *p__refcounted_kptr, void *meta
 	 * bpf_refcount type so that it is emitted in vmlinux BTF
 	 */
 	ref = (struct bpf_refcount *)(p__refcounted_kptr + meta->record->refcount_off);
+	if (!refcount_inc_not_zero((refcount_t *)ref))
+		return NULL;
 
-	refcount_inc((refcount_t *)ref);
+	/* Verifier strips KF_RET_NULL if input is owned ref, see is_kfunc_ret_null
+	 * in verifier.c
+	 */
 	return (void *)p__refcounted_kptr;
 }
 
@@ -2406,7 +2410,7 @@ BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
 #endif
 BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_obj_drop_impl, KF_RELEASE)
-BTF_ID_FLAGS(func, bpf_refcount_acquire_impl, KF_ACQUIRE)
+BTF_ID_FLAGS(func, bpf_refcount_acquire_impl, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_list_push_front_impl)
 BTF_ID_FLAGS(func, bpf_list_push_back_impl)
 BTF_ID_FLAGS(func, bpf_list_pop_front, KF_ACQUIRE | KF_RET_NULL)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 43d26b65a3ad..48c3e2bbcc4a 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -298,16 +298,19 @@ struct bpf_kfunc_call_arg_meta {
 		bool found;
 	} arg_constant;
 
-	/* arg_btf and arg_btf_id are used by kfunc-specific handling,
+	/* arg_{btf,btf_id,owning_ref} are used by kfunc-specific handling,
 	 * generally to pass info about user-defined local kptr types to later
 	 * verification logic
 	 *   bpf_obj_drop
 	 *     Record the local kptr type to be drop'd
 	 *   bpf_refcount_acquire (via KF_ARG_PTR_TO_REFCOUNTED_KPTR arg type)
-	 *     Record the local kptr type to be refcount_incr'd
+	 *     Record the local kptr type to be refcount_incr'd and use
+	 *     arg_owning_ref to determine whether refcount_acquire should be
+	 *     fallible
 	 */
 	struct btf *arg_btf;
 	u32 arg_btf_id;
+	bool arg_owning_ref;
 
 	struct {
 		struct btf_field *field;
@@ -9678,11 +9681,6 @@ static bool is_kfunc_acquire(struct bpf_kfunc_call_arg_meta *meta)
 	return meta->kfunc_flags & KF_ACQUIRE;
 }
 
-static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
-{
-	return meta->kfunc_flags & KF_RET_NULL;
-}
-
 static bool is_kfunc_release(struct bpf_kfunc_call_arg_meta *meta)
 {
 	return meta->kfunc_flags & KF_RELEASE;
@@ -9998,6 +9996,16 @@ BTF_ID(func, bpf_dynptr_slice)
 BTF_ID(func, bpf_dynptr_slice_rdwr)
 BTF_ID(func, bpf_dynptr_clone)
 
+static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
+{
+	if (meta->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl] &&
+	    meta->arg_owning_ref) {
+		return false;
+	}
+
+	return meta->kfunc_flags & KF_RET_NULL;
+}
+
 static bool is_kfunc_bpf_rcu_read_lock(struct bpf_kfunc_call_arg_meta *meta)
 {
 	return meta->func_id == special_kfunc_list[KF_bpf_rcu_read_lock];
@@ -10880,10 +10888,12 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			meta->subprogno = reg->subprogno;
 			break;
 		case KF_ARG_PTR_TO_REFCOUNTED_KPTR:
-			if (!type_is_ptr_alloc_obj(reg->type) && !type_is_non_owning_ref(reg->type)) {
+			if (!type_is_ptr_alloc_obj(reg->type)) {
 				verbose(env, "arg#%d is neither owning or non-owning ref\n", i);
 				return -EINVAL;
 			}
+			if (!type_is_non_owning_ref(reg->type))
+				meta->arg_owning_ref = true;
 
 			rec = reg_btf_record(reg);
 			if (!rec) {
diff --git a/tools/testing/selftests/bpf/progs/refcounted_kptr.c b/tools/testing/selftests/bpf/progs/refcounted_kptr.c
index 1d348a225140..a3da610b1e6b 100644
--- a/tools/testing/selftests/bpf/progs/refcounted_kptr.c
+++ b/tools/testing/selftests/bpf/progs/refcounted_kptr.c
@@ -375,6 +375,8 @@ long rbtree_refcounted_node_ref_escapes(void *ctx)
 	bpf_rbtree_add(&aroot, &n->node, less_a);
 	m = bpf_refcount_acquire(n);
 	bpf_spin_unlock(&alock);
+	if (!m)
+		return 2;
 
 	m->key = 2;
 	bpf_obj_drop(m);
diff --git a/tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c b/tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c
index efcb308f80ad..0b09e5c915b1 100644
--- a/tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c
+++ b/tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c
@@ -29,7 +29,7 @@ static bool less(struct bpf_rb_node *a, const struct bpf_rb_node *b)
 }
 
 SEC("?tc")
-__failure __msg("Unreleased reference id=3 alloc_insn=21")
+__failure __msg("Unreleased reference id=4 alloc_insn=21")
 long rbtree_refcounted_node_ref_escapes(void *ctx)
 {
 	struct node_acquire *n, *m;
@@ -43,6 +43,8 @@ long rbtree_refcounted_node_ref_escapes(void *ctx)
 	/* m becomes an owning ref but is never drop'd or added to a tree */
 	m = bpf_refcount_acquire(n);
 	bpf_spin_unlock(&glock);
+	if (!m)
+		return 2;
 
 	m->key = 2;
 	return 0;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 bpf-next 5/9] [DONOTAPPLY] bpf: Allow KF_DESTRUCTIVE-flagged kfuncs to be called under spinlock
  2023-06-02  2:26 [PATCH v2 bpf-next 0/9] bpf_refcount followups (part 1) Dave Marchevsky
                   ` (3 preceding siblings ...)
  2023-06-02  2:26 ` [PATCH v2 bpf-next 4/9] bpf: Make bpf_refcount_acquire fallible for non-owning refs Dave Marchevsky
@ 2023-06-02  2:26 ` Dave Marchevsky
  2023-06-02  2:26 ` [PATCH v2 bpf-next 6/9] [DONOTAPPLY] selftests/bpf: Add unsafe lock/unlock and refcount_read kfuncs to bpf_testmod Dave Marchevsky
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Dave Marchevsky @ 2023-06-02  2:26 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, Dave Marchevsky

In order to prevent deadlock the verifier currently disallows any
function calls under bpf_spin_lock save for a small set of allowlisted
helpers/kfuncs. A BPF program that calls destructive kfuncs might be
trying to cause deadlock, and regardless is understood to be capable of
causing system breakage of similar severity. Per kfuncs.rst:

  The KF_DESTRUCTIVE flag is used to indicate functions calling which is
  destructive to the system. For example such a call can result in system
  rebooting or panicking. Due to this additional restrictions apply to these
  calls.

Preventing BPF programs from crashing or otherwise blowing up the system
is generally the verifier's goal, but destructive kfuncs might have such
a state be their intended result. Preventing KF_DESTRUCTIVE kfunc calls
under spinlock with the goal of safety is therefore unnecessarily
strict. This patch modifies the "function calls are not allowed while
holding a lock" check to allow calling destructive kfuncs with an
active_lock.

The motivating usecase for this change - unsafe locking of
bpf_spin_locks for easy testing of race conditions - is implemented in
the next two patches in the series.

Note that the removed insn->off check was rejecting any calls to kfuncs
defined in non-vmlinux BTF. In order to get the system in a broken or
otherwise interesting state for inspection, developers might load a
module implementing destructive kfuncs particular to their usecase. The
unsafe_spin_{lock, unlock} kfuncs later in this series are a good
example: there's no clear reason for them to be in vmlinux as they're
specifically for BPF selftests, so they live in bpf_testmod. The check
is removed in favor of a newly-added helper function to enable such
usecases.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/verifier.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 48c3e2bbcc4a..1bf0e6411feb 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -330,6 +330,11 @@ struct bpf_kfunc_call_arg_meta {
 	u64 mem_size;
 };
 
+static int fetch_kfunc_meta(struct bpf_verifier_env *env,
+			    struct bpf_insn *insn,
+			    struct bpf_kfunc_call_arg_meta *meta,
+			    const char **kfunc_name);
+
 struct btf *btf_vmlinux;
 
 static DEFINE_MUTEX(bpf_verifier_lock);
@@ -10313,6 +10318,21 @@ static bool is_rbtree_lock_required_kfunc(u32 btf_id)
 	return is_bpf_rbtree_api_kfunc(btf_id);
 }
 
+static bool is_kfunc_callable_in_spinlock(struct bpf_verifier_env *env,
+					  struct bpf_insn *insn)
+{
+	struct bpf_kfunc_call_arg_meta meta;
+
+	/* insn->off is idx into btf fd_array - 0 for vmlinux btf, else nonzero */
+	if (!insn->off && is_bpf_graph_api_kfunc(insn->imm))
+		return true;
+
+	if (fetch_kfunc_meta(env, insn, &meta, NULL))
+		return false;
+
+	return is_kfunc_destructive(&meta);
+}
+
 static bool check_kfunc_is_graph_root_api(struct bpf_verifier_env *env,
 					  enum btf_field_type head_field_type,
 					  u32 kfunc_btf_id)
@@ -16218,7 +16238,7 @@ static int do_check(struct bpf_verifier_env *env)
 					if ((insn->src_reg == BPF_REG_0 && insn->imm != BPF_FUNC_spin_unlock) ||
 					    (insn->src_reg == BPF_PSEUDO_CALL) ||
 					    (insn->src_reg == BPF_PSEUDO_KFUNC_CALL &&
-					     (insn->off != 0 || !is_bpf_graph_api_kfunc(insn->imm)))) {
+					     !is_kfunc_callable_in_spinlock(env, insn))) {
 						verbose(env, "function calls are not allowed while holding a lock\n");
 						return -EINVAL;
 					}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 bpf-next 6/9] [DONOTAPPLY] selftests/bpf: Add unsafe lock/unlock and refcount_read kfuncs to bpf_testmod
  2023-06-02  2:26 [PATCH v2 bpf-next 0/9] bpf_refcount followups (part 1) Dave Marchevsky
                   ` (4 preceding siblings ...)
  2023-06-02  2:26 ` [PATCH v2 bpf-next 5/9] [DONOTAPPLY] bpf: Allow KF_DESTRUCTIVE-flagged kfuncs to be called under spinlock Dave Marchevsky
@ 2023-06-02  2:26 ` Dave Marchevsky
  2023-06-02  2:26 ` [PATCH v2 bpf-next 7/9] [DONOTAPPLY] selftests/bpf: Add test exercising bpf_refcount_acquire race condition Dave Marchevsky
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Dave Marchevsky @ 2023-06-02  2:26 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, Dave Marchevsky

[
RFC: This patch currently copies static inline helpers:

  __bpf_spin_lock
  __bpf_spin_unlock
  __bpf_spin_lock_irqsave
  __bpf_spin_unlock_irqrestore

from kernel/bpf/helpers.c . The definition of these helpers is
config-dependant and they're not meant to be called from a module, so
not sure how to proceed here.
]

This patch adds three unsafe kfuncs to bpf_testmod for use in
selftests:

  - bpf__unsafe_spin_lock
  - bpf__unsafe_spin_unlock
  - bpf_refcount_read

The first two are equivalent to bpf_spin_{lock, unlock}, except without
any special treatment from the verifier, which allows them to be used in
tests to guarantee a specific interleaving of program execution. This
will simplify testing race conditions in BPF programs, as demonstrated
in further patches in the series. The kfuncs are marked KF_DESTRUCTIVE
as they can easily cause deadlock, and are only intended to be used in
tests.

bpf_refcount_read simply reads the refcount from the uapi-opaque
bpf_refcount struct and returns it. This allows more precise testing of
specific bpf_refcount scenarios, also demonstrated in further patches in
the series. Although this kfunc can't break the system as
catastrophically as the unsafe locking kfuncs, it's also marked
KF_DESTRUCTIVE as it relies on bpf_refcount implementation details, and
shouldn't be used outside of tests regardless.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 .../selftests/bpf/bpf_testmod/bpf_testmod.c   | 61 +++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
index cf216041876c..abac7a212ec2 100644
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
@@ -109,6 +109,64 @@ __bpf_kfunc void bpf_iter_testmod_seq_destroy(struct bpf_iter_testmod_seq *it)
 	it->cnt = 0;
 }
 
+/* BEGIN copied from kernel/bpf/helpers.c */
+static DEFINE_PER_CPU(unsigned long, irqsave_flags);
+
+static inline void __bpf_spin_lock(struct bpf_spin_lock *lock)
+{
+        arch_spinlock_t *l = (void *)lock;
+        union {
+                __u32 val;
+                arch_spinlock_t lock;
+        } u = { .lock = __ARCH_SPIN_LOCK_UNLOCKED };
+
+        compiletime_assert(u.val == 0, "__ARCH_SPIN_LOCK_UNLOCKED not 0");
+        BUILD_BUG_ON(sizeof(*l) != sizeof(__u32));
+        BUILD_BUG_ON(sizeof(*lock) != sizeof(__u32));
+        arch_spin_lock(l);
+}
+
+static inline void __bpf_spin_unlock(struct bpf_spin_lock *lock)
+{
+        arch_spinlock_t *l = (void *)lock;
+
+        arch_spin_unlock(l);
+}
+
+static inline void __bpf_spin_lock_irqsave(struct bpf_spin_lock *lock)
+{
+        unsigned long flags;
+
+        local_irq_save(flags);
+        __bpf_spin_lock(lock);
+        __this_cpu_write(irqsave_flags, flags);
+}
+
+static inline void __bpf_spin_unlock_irqrestore(struct bpf_spin_lock *lock)
+{
+        unsigned long flags;
+
+        flags = __this_cpu_read(irqsave_flags);
+        __bpf_spin_unlock(lock);
+        local_irq_restore(flags);
+}
+/* END copied from kernel/bpf/helpers.c */
+
+__bpf_kfunc void bpf__unsafe_spin_lock(void *lock__ign)
+{
+	__bpf_spin_lock_irqsave((struct bpf_spin_lock *)lock__ign);
+}
+
+__bpf_kfunc void bpf__unsafe_spin_unlock(void *lock__ign)
+{
+	__bpf_spin_unlock_irqrestore((struct bpf_spin_lock *)lock__ign);
+}
+
+__bpf_kfunc int bpf_refcount_read(void *refcount__ign)
+{
+	return refcount_read((refcount_t *)refcount__ign);
+}
+
 struct bpf_testmod_btf_type_tag_1 {
 	int a;
 };
@@ -283,6 +341,9 @@ BTF_SET8_START(bpf_testmod_common_kfunc_ids)
 BTF_ID_FLAGS(func, bpf_iter_testmod_seq_new, KF_ITER_NEW)
 BTF_ID_FLAGS(func, bpf_iter_testmod_seq_next, KF_ITER_NEXT | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_iter_testmod_seq_destroy, KF_ITER_DESTROY)
+BTF_ID_FLAGS(func, bpf__unsafe_spin_lock, KF_DESTRUCTIVE)
+BTF_ID_FLAGS(func, bpf__unsafe_spin_unlock, KF_DESTRUCTIVE)
+BTF_ID_FLAGS(func, bpf_refcount_read, KF_DESTRUCTIVE)
 BTF_SET8_END(bpf_testmod_common_kfunc_ids)
 
 static const struct btf_kfunc_id_set bpf_testmod_common_kfunc_set = {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 bpf-next 7/9] [DONOTAPPLY] selftests/bpf: Add test exercising bpf_refcount_acquire race condition
  2023-06-02  2:26 [PATCH v2 bpf-next 0/9] bpf_refcount followups (part 1) Dave Marchevsky
                   ` (5 preceding siblings ...)
  2023-06-02  2:26 ` [PATCH v2 bpf-next 6/9] [DONOTAPPLY] selftests/bpf: Add unsafe lock/unlock and refcount_read kfuncs to bpf_testmod Dave Marchevsky
@ 2023-06-02  2:26 ` Dave Marchevsky
  2023-06-02  2:26 ` [PATCH v2 bpf-next 8/9] [DONOTAPPLY] selftests/bpf: Disable newly-added refcounted_kptr_races test Dave Marchevsky
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Dave Marchevsky @ 2023-06-02  2:26 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, Dave Marchevsky

The selftest added in this patch is the exact scenario described by
Kumar in [0] and fixed by earlier patches in this series. The long
comment added in progs/refcounted_kptr.c restates the use-after-free
scenario.

The added test uses bpf__unsafe_spin_{lock, unlock} to force the
specific problematic interleaving we're interested in testing, and
bpf_refcount_read to confirm refcount incr/decr work as expected.

  [0]: https://lore.kernel.org/bpf/atfviesiidev4hu53hzravmtlau3wdodm2vqs7rd7tnwft34e3@xktodqeqevir/

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 .../bpf/prog_tests/refcounted_kptr.c          | 104 +++++++++++-
 .../selftests/bpf/progs/refcounted_kptr.c     | 158 ++++++++++++++++++
 2 files changed, 261 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c b/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
index 2ab23832062d..e7fcc1dd8864 100644
--- a/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
+++ b/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
@@ -1,8 +1,10 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
-
+#define _GNU_SOURCE
 #include <test_progs.h>
 #include <network_helpers.h>
+#include <pthread.h>
+#include <sched.h>
 
 #include "refcounted_kptr.skel.h"
 #include "refcounted_kptr_fail.skel.h"
@@ -16,3 +18,103 @@ void test_refcounted_kptr_fail(void)
 {
 	RUN_TESTS(refcounted_kptr_fail);
 }
+
+static void force_cpu(pthread_t thread, int cpunum)
+{
+	cpu_set_t cpuset;
+	int err;
+
+	CPU_ZERO(&cpuset);
+	CPU_SET(cpunum, &cpuset);
+	err = pthread_setaffinity_np(thread, sizeof(cpuset), &cpuset);
+	if (!ASSERT_OK(err, "pthread_setaffinity_np"))
+		return;
+}
+
+struct refcounted_kptr *skel;
+
+static void *run_unstash_acq_ref(void *unused)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, opts,
+		.data_in = &pkt_v4,
+		.data_size_in = sizeof(pkt_v4),
+		.repeat = 1,
+	);
+	long ret, unstash_acq_ref_fd;
+	force_cpu(pthread_self(), 1);
+
+	unstash_acq_ref_fd = bpf_program__fd(skel->progs.unstash_add_and_acquire_refcount);
+
+	ret = bpf_prog_test_run_opts(unstash_acq_ref_fd, &opts);
+	ASSERT_EQ(opts.retval, 0, "unstash_add_and_acquire_refcount retval");
+	ASSERT_EQ(skel->bss->ref_check_3, 2, "ref_check_3");
+	ASSERT_EQ(skel->bss->ref_check_4, 1, "ref_check_4");
+	ASSERT_EQ(skel->bss->ref_check_5, 0, "ref_check_5");
+	pthread_exit((void *)ret);
+}
+
+void test_refcounted_kptr_races(void)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, opts,
+		.data_in = &pkt_v4,
+		.data_size_in = sizeof(pkt_v4),
+		.repeat = 1,
+	);
+	int ref_acq_lock_fd, ref_acq_unlock_fd, rem_node_lock_fd;
+	int add_stash_fd, remove_tree_fd;
+	pthread_t thread_id;
+	int ret;
+
+	force_cpu(pthread_self(), 0);
+	skel = refcounted_kptr__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "refcounted_kptr__open_and_load"))
+		return;
+
+	add_stash_fd = bpf_program__fd(skel->progs.add_refcounted_node_to_tree_and_stash);
+	remove_tree_fd = bpf_program__fd(skel->progs.remove_refcounted_node_from_tree);
+	ref_acq_lock_fd = bpf_program__fd(skel->progs.unsafe_ref_acq_lock);
+	ref_acq_unlock_fd = bpf_program__fd(skel->progs.unsafe_ref_acq_unlock);
+	rem_node_lock_fd = bpf_program__fd(skel->progs.unsafe_rem_node_lock);
+
+	ret = bpf_prog_test_run_opts(rem_node_lock_fd, &opts);
+	if (!ASSERT_OK(ret, "rem_node_lock"))
+		return;
+
+	ret = bpf_prog_test_run_opts(ref_acq_lock_fd, &opts);
+	if (!ASSERT_OK(ret, "ref_acq_lock"))
+		return;
+
+	ret = bpf_prog_test_run_opts(add_stash_fd, &opts);
+	if (!ASSERT_OK(ret, "add_stash"))
+		return;
+	if (!ASSERT_OK(opts.retval, "add_stash retval"))
+		return;
+
+	ret = pthread_create(&thread_id, NULL, &run_unstash_acq_ref, NULL);
+	if (!ASSERT_OK(ret, "pthread_create"))
+		goto cleanup;
+
+	force_cpu(thread_id, 1);
+
+	/* This program will execute before unstash_acq_ref's refcount_acquire, then
+	 * unstash_acq_ref can proceed after unsafe_unlock
+	 */
+	ret = bpf_prog_test_run_opts(remove_tree_fd, &opts);
+	if (!ASSERT_OK(ret, "remove_tree"))
+		goto cleanup;
+
+	ret = bpf_prog_test_run_opts(ref_acq_unlock_fd, &opts);
+	if (!ASSERT_OK(ret, "ref_acq_unlock"))
+		goto cleanup;
+
+	ret = pthread_join(thread_id, NULL);
+	if (!ASSERT_OK(ret, "pthread_join"))
+		goto cleanup;
+
+	refcounted_kptr__destroy(skel);
+	return;
+cleanup:
+	bpf_prog_test_run_opts(ref_acq_unlock_fd, &opts);
+	refcounted_kptr__destroy(skel);
+	return;
+}
diff --git a/tools/testing/selftests/bpf/progs/refcounted_kptr.c b/tools/testing/selftests/bpf/progs/refcounted_kptr.c
index a3da610b1e6b..2951f45291c1 100644
--- a/tools/testing/selftests/bpf/progs/refcounted_kptr.c
+++ b/tools/testing/selftests/bpf/progs/refcounted_kptr.c
@@ -39,9 +39,20 @@ private(A) struct bpf_spin_lock lock;
 private(A) struct bpf_rb_root root __contains(node_data, r);
 private(A) struct bpf_list_head head __contains(node_data, l);
 
+private(C) struct bpf_spin_lock lock2;
+private(C) struct bpf_rb_root root2 __contains(node_data, r);
+
 private(B) struct bpf_spin_lock alock;
 private(B) struct bpf_rb_root aroot __contains(node_acquire, node);
 
+private(D) struct bpf_spin_lock ref_acq_lock;
+private(E) struct bpf_spin_lock rem_node_lock;
+
+/* Provided by bpf_testmod */
+extern void bpf__unsafe_spin_lock(void *lock__ign) __ksym;
+extern void bpf__unsafe_spin_unlock(void *lock__ign) __ksym;
+extern volatile int bpf_refcount_read(void *refcount__ign) __ksym;
+
 static bool less(struct bpf_rb_node *node_a, const struct bpf_rb_node *node_b)
 {
 	struct node_data *a;
@@ -405,4 +416,151 @@ long rbtree_refcounted_node_ref_escapes_owning_input(void *ctx)
 	return 0;
 }
 
+SEC("tc")
+long unsafe_ref_acq_lock(void *ctx)
+{
+	bpf__unsafe_spin_lock(&ref_acq_lock);
+	return 0;
+}
+
+SEC("tc")
+long unsafe_ref_acq_unlock(void *ctx)
+{
+	bpf__unsafe_spin_unlock(&ref_acq_lock);
+	return 0;
+}
+
+SEC("tc")
+long unsafe_rem_node_lock(void *ctx)
+{
+	bpf__unsafe_spin_lock(&rem_node_lock);
+	return 0;
+}
+
+/* The following 3 progs are used in concert to test a bpf_refcount-related
+ * race. Consider the following pseudocode interleaving of rbtree operations:
+ *
+ * (Assumptions: n, m, o, p, q are pointers to nodes, t1 and t2 are different
+ * rbtrees, l1 and l2 are locks accompanying the trees, mapval is some
+ * kptr_xchg'able ptr_to_map_value. A single node is being manipulated by both
+ * programs. Irrelevant error-checking and casting is omitted.)
+ *
+ *               CPU O                               CPU 1
+ *     ----------------------------------|---------------------------
+ *     n = bpf_obj_new  [0]              |
+ *     lock(l1)                          |
+ *     bpf_rbtree_add(t1, &n->r, less)   |
+ *     m = bpf_refcount_acquire(n)  [1]  |
+ *     unlock(l1)                        |
+ *     kptr_xchg(mapval, m)         [2]  |
+ *     --------------------------------------------------------------
+ *                                       |    o = kptr_xchg(mapval, NULL)  [3]
+ *                                       |    lock(l2)
+ *                                       |    rbtree_add(t2, &o->r, less)  [4]
+ *     --------------------------------------------------------------
+ *     lock(l1)                          |
+ *     p = rbtree_first(t1)              |
+ *     p = rbtree_remove(t1, p)          |
+ *     unlock(l1)                        |
+ *     if (p)                            |
+ *       bpf_obj_drop(p)  [5]            |
+ *     --------------------------------------------------------------
+ *                                       |    q = bpf_refcount_acquire(o)  [6]
+ *                                       |    unlock(l2)
+ *
+ * If bpf_refcount_acquire can't fail, the sequence of operations on the node's
+ * refcount is:
+ *    [0] - refcount initialized to 1
+ *    [1] - refcount bumped to 2
+ *    [2] - refcount is still 2, but m's ownership passed to mapval
+ *    [3] - refcount is still 2, mapval's ownership passed to o
+ *    [4] - refcount is decr'd to 1, rbtree_add fails, node is already in t1
+ *          o is converted to non-owning reference
+ *    [5] - refcount is decr'd to 0, node free'd
+ *    [6] - refcount is incr'd to 1 from 0, ERROR
+ *
+ * To prevent [6] bpf_refcount_acquire was made failable. This interleaving is
+ * used to test failable refcount_acquire.
+ *
+ * The two halves of CPU 0's operations are implemented by
+ * add_refcounted_node_to_tree_and_stash and remove_refcounted_node_from_tree.
+ * We can't do the same for CPU 1's operations due to l2 critical section.
+ * Instead, bpf__unsafe_spin_{lock, unlock} are used to ensure the expected
+ * order of operations.
+ */
+
+SEC("tc")
+long add_refcounted_node_to_tree_and_stash(void *ctx)
+{
+	long err;
+
+	err = __stash_map_insert_tree(0, 42, &root, &lock);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+SEC("tc")
+long remove_refcounted_node_from_tree(void *ctx)
+{
+	long ret = 0;
+
+	/* rem_node_lock is held by another program to force race */
+	bpf__unsafe_spin_lock(&rem_node_lock);
+	ret = __read_from_tree(&root, &lock, true);
+	if (ret != 42)
+		return ret;
+
+	bpf__unsafe_spin_unlock(&rem_node_lock);
+	return 0;
+}
+
+/* ref_check_n numbers correspond to refcount operation points in comment above */
+int ref_check_3, ref_check_4, ref_check_5;
+
+SEC("tc")
+long unstash_add_and_acquire_refcount(void *ctx)
+{
+	struct map_value *mapval;
+	struct node_data *n, *m;
+	int idx = 0;
+
+	mapval = bpf_map_lookup_elem(&stashed_nodes, &idx);
+	if (!mapval)
+		return -1;
+
+	n = bpf_kptr_xchg(&mapval->node, NULL);
+	if (!n)
+		return -2;
+	ref_check_3 = bpf_refcount_read(&n->ref);
+
+	bpf_spin_lock(&lock2);
+	bpf_rbtree_add(&root2, &n->r, less);
+	ref_check_4 = bpf_refcount_read(&n->ref);
+
+	/* Let CPU 0 do first->remove->drop */
+	bpf__unsafe_spin_unlock(&rem_node_lock);
+
+	/* ref_acq_lock is held by another program to force race
+	 * when this program holds the lock, remove_refcounted_node_from_tree
+	 * has finished
+	 */
+	bpf__unsafe_spin_lock(&ref_acq_lock);
+	ref_check_5 = bpf_refcount_read(&n->ref);
+
+	/* Error-causing use-after-free incr ([6] in long comment above) */
+	m = bpf_refcount_acquire(n);
+	bpf__unsafe_spin_unlock(&ref_acq_lock);
+
+	bpf_spin_unlock(&lock2);
+
+	if (m) {
+		bpf_obj_drop(m);
+		return -3;
+	}
+
+	return !!m;
+}
+
 char _license[] SEC("license") = "GPL";
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 bpf-next 8/9] [DONOTAPPLY] selftests/bpf: Disable newly-added refcounted_kptr_races test
  2023-06-02  2:26 [PATCH v2 bpf-next 0/9] bpf_refcount followups (part 1) Dave Marchevsky
                   ` (6 preceding siblings ...)
  2023-06-02  2:26 ` [PATCH v2 bpf-next 7/9] [DONOTAPPLY] selftests/bpf: Add test exercising bpf_refcount_acquire race condition Dave Marchevsky
@ 2023-06-02  2:26 ` Dave Marchevsky
  2023-06-02  2:26 ` [PATCH v2 bpf-next 9/9] [DONOTAPPLY] Revert "selftests/bpf: Disable newly-added refcounted_kptr_races test" Dave Marchevsky
  2023-06-05 20:30 ` [PATCH v2 bpf-next 0/9] bpf_refcount followups (part 1) patchwork-bot+netdevbpf
  9 siblings, 0 replies; 11+ messages in thread
From: Dave Marchevsky @ 2023-06-02  2:26 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, Dave Marchevsky

The previous patch added a new test exercising a race condition which
was fixed earlier in the series. Similarly to other tests in this file,
the new test should not run while bpf_refcount_acquire is disabled as it
requires that kfunc.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 .../bpf/prog_tests/refcounted_kptr.c          | 100 ------------------
 1 file changed, 100 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c b/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
index e7fcc1dd8864..6a53f304f3e4 100644
--- a/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
+++ b/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
@@ -18,103 +18,3 @@ void test_refcounted_kptr_fail(void)
 {
 	RUN_TESTS(refcounted_kptr_fail);
 }
-
-static void force_cpu(pthread_t thread, int cpunum)
-{
-	cpu_set_t cpuset;
-	int err;
-
-	CPU_ZERO(&cpuset);
-	CPU_SET(cpunum, &cpuset);
-	err = pthread_setaffinity_np(thread, sizeof(cpuset), &cpuset);
-	if (!ASSERT_OK(err, "pthread_setaffinity_np"))
-		return;
-}
-
-struct refcounted_kptr *skel;
-
-static void *run_unstash_acq_ref(void *unused)
-{
-	LIBBPF_OPTS(bpf_test_run_opts, opts,
-		.data_in = &pkt_v4,
-		.data_size_in = sizeof(pkt_v4),
-		.repeat = 1,
-	);
-	long ret, unstash_acq_ref_fd;
-	force_cpu(pthread_self(), 1);
-
-	unstash_acq_ref_fd = bpf_program__fd(skel->progs.unstash_add_and_acquire_refcount);
-
-	ret = bpf_prog_test_run_opts(unstash_acq_ref_fd, &opts);
-	ASSERT_EQ(opts.retval, 0, "unstash_add_and_acquire_refcount retval");
-	ASSERT_EQ(skel->bss->ref_check_3, 2, "ref_check_3");
-	ASSERT_EQ(skel->bss->ref_check_4, 1, "ref_check_4");
-	ASSERT_EQ(skel->bss->ref_check_5, 0, "ref_check_5");
-	pthread_exit((void *)ret);
-}
-
-void test_refcounted_kptr_races(void)
-{
-	LIBBPF_OPTS(bpf_test_run_opts, opts,
-		.data_in = &pkt_v4,
-		.data_size_in = sizeof(pkt_v4),
-		.repeat = 1,
-	);
-	int ref_acq_lock_fd, ref_acq_unlock_fd, rem_node_lock_fd;
-	int add_stash_fd, remove_tree_fd;
-	pthread_t thread_id;
-	int ret;
-
-	force_cpu(pthread_self(), 0);
-	skel = refcounted_kptr__open_and_load();
-	if (!ASSERT_OK_PTR(skel, "refcounted_kptr__open_and_load"))
-		return;
-
-	add_stash_fd = bpf_program__fd(skel->progs.add_refcounted_node_to_tree_and_stash);
-	remove_tree_fd = bpf_program__fd(skel->progs.remove_refcounted_node_from_tree);
-	ref_acq_lock_fd = bpf_program__fd(skel->progs.unsafe_ref_acq_lock);
-	ref_acq_unlock_fd = bpf_program__fd(skel->progs.unsafe_ref_acq_unlock);
-	rem_node_lock_fd = bpf_program__fd(skel->progs.unsafe_rem_node_lock);
-
-	ret = bpf_prog_test_run_opts(rem_node_lock_fd, &opts);
-	if (!ASSERT_OK(ret, "rem_node_lock"))
-		return;
-
-	ret = bpf_prog_test_run_opts(ref_acq_lock_fd, &opts);
-	if (!ASSERT_OK(ret, "ref_acq_lock"))
-		return;
-
-	ret = bpf_prog_test_run_opts(add_stash_fd, &opts);
-	if (!ASSERT_OK(ret, "add_stash"))
-		return;
-	if (!ASSERT_OK(opts.retval, "add_stash retval"))
-		return;
-
-	ret = pthread_create(&thread_id, NULL, &run_unstash_acq_ref, NULL);
-	if (!ASSERT_OK(ret, "pthread_create"))
-		goto cleanup;
-
-	force_cpu(thread_id, 1);
-
-	/* This program will execute before unstash_acq_ref's refcount_acquire, then
-	 * unstash_acq_ref can proceed after unsafe_unlock
-	 */
-	ret = bpf_prog_test_run_opts(remove_tree_fd, &opts);
-	if (!ASSERT_OK(ret, "remove_tree"))
-		goto cleanup;
-
-	ret = bpf_prog_test_run_opts(ref_acq_unlock_fd, &opts);
-	if (!ASSERT_OK(ret, "ref_acq_unlock"))
-		goto cleanup;
-
-	ret = pthread_join(thread_id, NULL);
-	if (!ASSERT_OK(ret, "pthread_join"))
-		goto cleanup;
-
-	refcounted_kptr__destroy(skel);
-	return;
-cleanup:
-	bpf_prog_test_run_opts(ref_acq_unlock_fd, &opts);
-	refcounted_kptr__destroy(skel);
-	return;
-}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 bpf-next 9/9] [DONOTAPPLY] Revert "selftests/bpf: Disable newly-added refcounted_kptr_races test"
  2023-06-02  2:26 [PATCH v2 bpf-next 0/9] bpf_refcount followups (part 1) Dave Marchevsky
                   ` (7 preceding siblings ...)
  2023-06-02  2:26 ` [PATCH v2 bpf-next 8/9] [DONOTAPPLY] selftests/bpf: Disable newly-added refcounted_kptr_races test Dave Marchevsky
@ 2023-06-02  2:26 ` Dave Marchevsky
  2023-06-05 20:30 ` [PATCH v2 bpf-next 0/9] bpf_refcount followups (part 1) patchwork-bot+netdevbpf
  9 siblings, 0 replies; 11+ messages in thread
From: Dave Marchevsky @ 2023-06-02  2:26 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, Dave Marchevsky

This patch reverts the previous patch's disabling of
refcounted_kptr_races selftest. It is included with the series so that
BPF CI will be able to run the test. This patch should not be applied -
followups which fix remaining bpf_refcount issues will re-enable this
test.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 .../bpf/prog_tests/refcounted_kptr.c          | 100 ++++++++++++++++++
 1 file changed, 100 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c b/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
index 6a53f304f3e4..e7fcc1dd8864 100644
--- a/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
+++ b/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
@@ -18,3 +18,103 @@ void test_refcounted_kptr_fail(void)
 {
 	RUN_TESTS(refcounted_kptr_fail);
 }
+
+static void force_cpu(pthread_t thread, int cpunum)
+{
+	cpu_set_t cpuset;
+	int err;
+
+	CPU_ZERO(&cpuset);
+	CPU_SET(cpunum, &cpuset);
+	err = pthread_setaffinity_np(thread, sizeof(cpuset), &cpuset);
+	if (!ASSERT_OK(err, "pthread_setaffinity_np"))
+		return;
+}
+
+struct refcounted_kptr *skel;
+
+static void *run_unstash_acq_ref(void *unused)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, opts,
+		.data_in = &pkt_v4,
+		.data_size_in = sizeof(pkt_v4),
+		.repeat = 1,
+	);
+	long ret, unstash_acq_ref_fd;
+	force_cpu(pthread_self(), 1);
+
+	unstash_acq_ref_fd = bpf_program__fd(skel->progs.unstash_add_and_acquire_refcount);
+
+	ret = bpf_prog_test_run_opts(unstash_acq_ref_fd, &opts);
+	ASSERT_EQ(opts.retval, 0, "unstash_add_and_acquire_refcount retval");
+	ASSERT_EQ(skel->bss->ref_check_3, 2, "ref_check_3");
+	ASSERT_EQ(skel->bss->ref_check_4, 1, "ref_check_4");
+	ASSERT_EQ(skel->bss->ref_check_5, 0, "ref_check_5");
+	pthread_exit((void *)ret);
+}
+
+void test_refcounted_kptr_races(void)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, opts,
+		.data_in = &pkt_v4,
+		.data_size_in = sizeof(pkt_v4),
+		.repeat = 1,
+	);
+	int ref_acq_lock_fd, ref_acq_unlock_fd, rem_node_lock_fd;
+	int add_stash_fd, remove_tree_fd;
+	pthread_t thread_id;
+	int ret;
+
+	force_cpu(pthread_self(), 0);
+	skel = refcounted_kptr__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "refcounted_kptr__open_and_load"))
+		return;
+
+	add_stash_fd = bpf_program__fd(skel->progs.add_refcounted_node_to_tree_and_stash);
+	remove_tree_fd = bpf_program__fd(skel->progs.remove_refcounted_node_from_tree);
+	ref_acq_lock_fd = bpf_program__fd(skel->progs.unsafe_ref_acq_lock);
+	ref_acq_unlock_fd = bpf_program__fd(skel->progs.unsafe_ref_acq_unlock);
+	rem_node_lock_fd = bpf_program__fd(skel->progs.unsafe_rem_node_lock);
+
+	ret = bpf_prog_test_run_opts(rem_node_lock_fd, &opts);
+	if (!ASSERT_OK(ret, "rem_node_lock"))
+		return;
+
+	ret = bpf_prog_test_run_opts(ref_acq_lock_fd, &opts);
+	if (!ASSERT_OK(ret, "ref_acq_lock"))
+		return;
+
+	ret = bpf_prog_test_run_opts(add_stash_fd, &opts);
+	if (!ASSERT_OK(ret, "add_stash"))
+		return;
+	if (!ASSERT_OK(opts.retval, "add_stash retval"))
+		return;
+
+	ret = pthread_create(&thread_id, NULL, &run_unstash_acq_ref, NULL);
+	if (!ASSERT_OK(ret, "pthread_create"))
+		goto cleanup;
+
+	force_cpu(thread_id, 1);
+
+	/* This program will execute before unstash_acq_ref's refcount_acquire, then
+	 * unstash_acq_ref can proceed after unsafe_unlock
+	 */
+	ret = bpf_prog_test_run_opts(remove_tree_fd, &opts);
+	if (!ASSERT_OK(ret, "remove_tree"))
+		goto cleanup;
+
+	ret = bpf_prog_test_run_opts(ref_acq_unlock_fd, &opts);
+	if (!ASSERT_OK(ret, "ref_acq_unlock"))
+		goto cleanup;
+
+	ret = pthread_join(thread_id, NULL);
+	if (!ASSERT_OK(ret, "pthread_join"))
+		goto cleanup;
+
+	refcounted_kptr__destroy(skel);
+	return;
+cleanup:
+	bpf_prog_test_run_opts(ref_acq_unlock_fd, &opts);
+	refcounted_kptr__destroy(skel);
+	return;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 bpf-next 0/9] bpf_refcount followups (part 1)
  2023-06-02  2:26 [PATCH v2 bpf-next 0/9] bpf_refcount followups (part 1) Dave Marchevsky
                   ` (8 preceding siblings ...)
  2023-06-02  2:26 ` [PATCH v2 bpf-next 9/9] [DONOTAPPLY] Revert "selftests/bpf: Disable newly-added refcounted_kptr_races test" Dave Marchevsky
@ 2023-06-05 20:30 ` patchwork-bot+netdevbpf
  9 siblings, 0 replies; 11+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-06-05 20:30 UTC (permalink / raw)
  To: Dave Marchevsky; +Cc: bpf, ast, daniel, andrii, martin.lau, kernel-team

Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Thu, 1 Jun 2023 19:26:38 -0700 you wrote:
> This series is the first of two (or more) followups to address issues in the
> bpf_refcount shared ownership implementation discovered by Kumar.
> Specifically, this series addresses the "bpf_refcount_acquire on non-owning ref
> in another tree" scenario described in [0], and does _not_ address issues
> raised in [1]. Further followups will address the other issues.
> 
> The series can be applied without re-enabling bpf_refcount_acquire calls, which
> were disabled in commit 7deca5eae833 ("bpf: Disable bpf_refcount_acquire kfunc
> calls until race conditions are fixed") until all issues are addressed. Some
> extra patches are included so that BPF CI tests will exercise test changes in
> the series.
> 
> [...]

Here is the summary with links:
  - [v2,bpf-next,1/9,DONOTAPPLY] Revert "bpf: Disable bpf_refcount_acquire kfunc calls until race conditions are fixed"
    (no matching commit)
  - [v2,bpf-next,2/9] bpf: Set kptr_struct_meta for node param to list and rbtree insert funcs
    https://git.kernel.org/bpf/bpf-next/c/2140a6e3422d
  - [v2,bpf-next,3/9] bpf: Fix __bpf_{list,rbtree}_add's beginning-of-node calculation
    https://git.kernel.org/bpf/bpf-next/c/cc0d76cafebb
  - [v2,bpf-next,4/9] bpf: Make bpf_refcount_acquire fallible for non-owning refs
    https://git.kernel.org/bpf/bpf-next/c/7793fc3babe9
  - [v2,bpf-next,5/9,DONOTAPPLY] bpf: Allow KF_DESTRUCTIVE-flagged kfuncs to be called under spinlock
    (no matching commit)
  - [v2,bpf-next,6/9,DONOTAPPLY] selftests/bpf: Add unsafe lock/unlock and refcount_read kfuncs to bpf_testmod
    (no matching commit)
  - [v2,bpf-next,7/9,DONOTAPPLY] selftests/bpf: Add test exercising bpf_refcount_acquire race condition
    (no matching commit)
  - [v2,bpf-next,8/9,DONOTAPPLY] selftests/bpf: Disable newly-added refcounted_kptr_races test
    (no matching commit)
  - [v2,bpf-next,9/9,DONOTAPPLY] Revert "selftests/bpf: Disable newly-added refcounted_kptr_races test"
    (no matching commit)

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-06-05 20:30 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-02  2:26 [PATCH v2 bpf-next 0/9] bpf_refcount followups (part 1) Dave Marchevsky
2023-06-02  2:26 ` [PATCH v2 bpf-next 1/9] [DONOTAPPLY] Revert "bpf: Disable bpf_refcount_acquire kfunc calls until race conditions are fixed" Dave Marchevsky
2023-06-02  2:26 ` [PATCH v2 bpf-next 2/9] bpf: Set kptr_struct_meta for node param to list and rbtree insert funcs Dave Marchevsky
2023-06-02  2:26 ` [PATCH v2 bpf-next 3/9] bpf: Fix __bpf_{list,rbtree}_add's beginning-of-node calculation Dave Marchevsky
2023-06-02  2:26 ` [PATCH v2 bpf-next 4/9] bpf: Make bpf_refcount_acquire fallible for non-owning refs Dave Marchevsky
2023-06-02  2:26 ` [PATCH v2 bpf-next 5/9] [DONOTAPPLY] bpf: Allow KF_DESTRUCTIVE-flagged kfuncs to be called under spinlock Dave Marchevsky
2023-06-02  2:26 ` [PATCH v2 bpf-next 6/9] [DONOTAPPLY] selftests/bpf: Add unsafe lock/unlock and refcount_read kfuncs to bpf_testmod Dave Marchevsky
2023-06-02  2:26 ` [PATCH v2 bpf-next 7/9] [DONOTAPPLY] selftests/bpf: Add test exercising bpf_refcount_acquire race condition Dave Marchevsky
2023-06-02  2:26 ` [PATCH v2 bpf-next 8/9] [DONOTAPPLY] selftests/bpf: Disable newly-added refcounted_kptr_races test Dave Marchevsky
2023-06-02  2:26 ` [PATCH v2 bpf-next 9/9] [DONOTAPPLY] Revert "selftests/bpf: Disable newly-added refcounted_kptr_races test" Dave Marchevsky
2023-06-05 20:30 ` [PATCH v2 bpf-next 0/9] bpf_refcount followups (part 1) patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).