All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v4 0/3] fixes for concurrent htab updates
@ 2022-08-31  4:26 Hou Tao
  2022-08-31  4:26 ` [PATCH bpf-next v4 1/3] bpf: Disable preemption when increasing per-cpu map_locked Hou Tao
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Hou Tao @ 2022-08-31  4:26 UTC (permalink / raw)
  To: bpf
  Cc: Song Liu, Sebastian Andrzej Siewior, Hao Sun, Hao Luo,
	Andrii Nakryiko, Yonghong Song, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, KP Singh, David S . Miller,
	Jakub Kicinski, Stanislav Fomichev, Jiri Olsa, John Fastabend,
	Lorenz Bauer, houtao1

From: Hou Tao <houtao1@huawei.com>

Hi,

The patchset aims to fix the issues found during investigating the
syzkaller problem reported in [0]. It seems that the concurrent updates
to the same hash-table bucket may fail as shown in patch 1.

Patch 1 uses preempt_disable() to fix the problem for
htab_use_raw_lock() case. For !htab_use_raw_lock() case, the problem is
left to "BPF specific memory allocator" patchset [1] in which
!htab_use_raw_lock() will be removed.

Patch 2 fixes the out-of-bound memory read problem reported in [0]. The
problem has the root cause as patch 1 and it is fixed by handling -EBUSY
from htab_lock_bucket() correctly.

Patch 3 add two cases for hash-table update: one for the reentrancy of
bpf_map_update_elem(), and another one for concurrent updates of the
same hash-table bucket.

Comments are always welcome.

Regards,
Tao

[0]: https://lore.kernel.org/bpf/CACkBjsbuxaR6cv0kXJoVnBfL9ZJXjjoUcMpw_Ogc313jSrg14A@mail.gmail.com/
[1]: https://lore.kernel.org/bpf/20220819214232.18784-1-alexei.starovoitov@gmail.com/

Change Log:

v4:
 * rebased on bpf-next
 * add htab_update to DENYLIST.s390x

v3: https://lore.kernel.org/bpf/20220829023709.1958204-1-houtao@huaweicloud.com/
 * patch 1: update commit message and add Fixes tag
 * patch 2: add Fixes tag
 * patch 3: elaborate the description of test cases

v2: https://lore.kernel.org/bpf/bd60ef93-1c6a-2db2-557d-b09b92ad22bd@huaweicloud.com/
 * Note the fix is for CONFIG_PREEMPT case in commit message and add
   Reviewed-by tag for patch 1
 * Drop patch "bpf: Allow normally concurrent map updates for !htab_use_raw_lock() case"

v1: https://lore.kernel.org/bpf/20220821033223.2598791-1-houtao@huaweicloud.com/

Hou Tao (3):
  bpf: Disable preemption when increasing per-cpu map_locked
  bpf: Propagate error from htab_lock_bucket() to userspace
  selftests/bpf: Add test cases for htab update

 kernel/bpf/hashtab.c                          |  30 ++++-
 tools/testing/selftests/bpf/DENYLIST.s390x    |   1 +
 .../selftests/bpf/prog_tests/htab_update.c    | 126 ++++++++++++++++++
 .../testing/selftests/bpf/progs/htab_update.c |  29 ++++
 4 files changed, 179 insertions(+), 7 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/htab_update.c
 create mode 100644 tools/testing/selftests/bpf/progs/htab_update.c

-- 
2.29.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH bpf-next v4 1/3] bpf: Disable preemption when increasing per-cpu map_locked
  2022-08-31  4:26 [PATCH bpf-next v4 0/3] fixes for concurrent htab updates Hou Tao
@ 2022-08-31  4:26 ` Hou Tao
  2022-08-31  4:26 ` [PATCH bpf-next v4 2/3] bpf: Propagate error from htab_lock_bucket() to userspace Hou Tao
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Hou Tao @ 2022-08-31  4:26 UTC (permalink / raw)
  To: bpf
  Cc: Song Liu, Sebastian Andrzej Siewior, Hao Sun, Hao Luo,
	Andrii Nakryiko, Yonghong Song, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, KP Singh, David S . Miller,
	Jakub Kicinski, Stanislav Fomichev, Jiri Olsa, John Fastabend,
	Lorenz Bauer, houtao1

From: Hou Tao <houtao1@huawei.com>

Per-cpu htab->map_locked is used to prohibit the concurrent accesses
from both NMI and non-NMI contexts. But since commit 74d862b682f5
("sched: Make migrate_disable/enable() independent of RT"),
migrate_disable() is also preemptible under CONFIG_PREEMPT case, so now
map_locked also disallows concurrent updates from normal contexts
(e.g. userspace processes) unexpectedly as shown below:

process A                      process B

htab_map_update_elem()
  htab_lock_bucket()
    migrate_disable()
    /* return 1 */
    __this_cpu_inc_return()
    /* preempted by B */

                               htab_map_update_elem()
                                 /* the same bucket as A */
                                 htab_lock_bucket()
                                   migrate_disable()
                                   /* return 2, so lock fails */
                                   __this_cpu_inc_return()
                                   return -EBUSY

A fix that seems feasible is using in_nmi() in htab_lock_bucket() and
only checking the value of map_locked for nmi context. But it will
re-introduce dead-lock on bucket lock if htab_lock_bucket() is re-entered
through non-tracing program (e.g. fentry program).

One cannot use preempt_disable() to fix this issue as htab_use_raw_lock
being false causes the bucket lock to be a spin lock which can sleep and
does not work with preempt_disable().

Therefore, use migrate_disable() when using the spinlock instead of
preempt_disable() and defer fixing concurrent updates to when the kernel
has its own BPF memory allocator.

Fixes: 74d862b682f5 ("sched: Make migrate_disable/enable() independent of RT")
Reviewed-by: Hao Luo <haoluo@google.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 kernel/bpf/hashtab.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index b301a63afa2f..6fb3b7fd1622 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -162,17 +162,25 @@ static inline int htab_lock_bucket(const struct bpf_htab *htab,
 				   unsigned long *pflags)
 {
 	unsigned long flags;
+	bool use_raw_lock;
 
 	hash = hash & HASHTAB_MAP_LOCK_MASK;
 
-	migrate_disable();
+	use_raw_lock = htab_use_raw_lock(htab);
+	if (use_raw_lock)
+		preempt_disable();
+	else
+		migrate_disable();
 	if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) {
 		__this_cpu_dec(*(htab->map_locked[hash]));
-		migrate_enable();
+		if (use_raw_lock)
+			preempt_enable();
+		else
+			migrate_enable();
 		return -EBUSY;
 	}
 
-	if (htab_use_raw_lock(htab))
+	if (use_raw_lock)
 		raw_spin_lock_irqsave(&b->raw_lock, flags);
 	else
 		spin_lock_irqsave(&b->lock, flags);
@@ -185,13 +193,18 @@ static inline void htab_unlock_bucket(const struct bpf_htab *htab,
 				      struct bucket *b, u32 hash,
 				      unsigned long flags)
 {
+	bool use_raw_lock = htab_use_raw_lock(htab);
+
 	hash = hash & HASHTAB_MAP_LOCK_MASK;
-	if (htab_use_raw_lock(htab))
+	if (use_raw_lock)
 		raw_spin_unlock_irqrestore(&b->raw_lock, flags);
 	else
 		spin_unlock_irqrestore(&b->lock, flags);
 	__this_cpu_dec(*(htab->map_locked[hash]));
-	migrate_enable();
+	if (use_raw_lock)
+		preempt_enable();
+	else
+		migrate_enable();
 }
 
 static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH bpf-next v4 2/3] bpf: Propagate error from htab_lock_bucket() to userspace
  2022-08-31  4:26 [PATCH bpf-next v4 0/3] fixes for concurrent htab updates Hou Tao
  2022-08-31  4:26 ` [PATCH bpf-next v4 1/3] bpf: Disable preemption when increasing per-cpu map_locked Hou Tao
@ 2022-08-31  4:26 ` Hou Tao
  2022-08-31  4:26 ` [PATCH bpf-next v4 3/3] selftests/bpf: Add test cases for htab update Hou Tao
  2022-08-31 21:20 ` [PATCH bpf-next v4 0/3] fixes for concurrent htab updates patchwork-bot+netdevbpf
  3 siblings, 0 replies; 5+ messages in thread
From: Hou Tao @ 2022-08-31  4:26 UTC (permalink / raw)
  To: bpf
  Cc: Song Liu, Sebastian Andrzej Siewior, Hao Sun, Hao Luo,
	Andrii Nakryiko, Yonghong Song, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, KP Singh, David S . Miller,
	Jakub Kicinski, Stanislav Fomichev, Jiri Olsa, John Fastabend,
	Lorenz Bauer, houtao1

From: Hou Tao <houtao1@huawei.com>

In __htab_map_lookup_and_delete_batch() if htab_lock_bucket() returns
-EBUSY, it will go to next bucket. Going to next bucket may not only
skip the elements in current bucket silently, but also incur
out-of-bound memory access or expose kernel memory to userspace if
current bucket_cnt is greater than bucket_size or zero.

Fixing it by stopping batch operation and returning -EBUSY when
htab_lock_bucket() fails, and the application can retry or skip the busy
batch as needed.

Fixes: 20b6cc34ea74 ("bpf: Avoid hashtab deadlock with map_locked")
Reported-by: Hao Sun <sunhao.th@gmail.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 kernel/bpf/hashtab.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 6fb3b7fd1622..eb1263f03e9b 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -1704,8 +1704,11 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 	/* do not grab the lock unless need it (bucket_cnt > 0). */
 	if (locked) {
 		ret = htab_lock_bucket(htab, b, batch, &flags);
-		if (ret)
-			goto next_batch;
+		if (ret) {
+			rcu_read_unlock();
+			bpf_enable_instrumentation();
+			goto after_loop;
+		}
 	}
 
 	bucket_cnt = 0;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH bpf-next v4 3/3] selftests/bpf: Add test cases for htab update
  2022-08-31  4:26 [PATCH bpf-next v4 0/3] fixes for concurrent htab updates Hou Tao
  2022-08-31  4:26 ` [PATCH bpf-next v4 1/3] bpf: Disable preemption when increasing per-cpu map_locked Hou Tao
  2022-08-31  4:26 ` [PATCH bpf-next v4 2/3] bpf: Propagate error from htab_lock_bucket() to userspace Hou Tao
@ 2022-08-31  4:26 ` Hou Tao
  2022-08-31 21:20 ` [PATCH bpf-next v4 0/3] fixes for concurrent htab updates patchwork-bot+netdevbpf
  3 siblings, 0 replies; 5+ messages in thread
From: Hou Tao @ 2022-08-31  4:26 UTC (permalink / raw)
  To: bpf
  Cc: Song Liu, Sebastian Andrzej Siewior, Hao Sun, Hao Luo,
	Andrii Nakryiko, Yonghong Song, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, KP Singh, David S . Miller,
	Jakub Kicinski, Stanislav Fomichev, Jiri Olsa, John Fastabend,
	Lorenz Bauer, houtao1

From: Hou Tao <houtao1@huawei.com>

One test demonstrates the reentrancy of hash map update on the same
bucket should fail, and another one shows concureently updates of
the same hash map bucket should succeed and not fail due to
the reentrancy checking for bucket lock.

There is no trampoline support on s390x, so move htab_update to
denylist.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 tools/testing/selftests/bpf/DENYLIST.s390x    |   1 +
 .../selftests/bpf/prog_tests/htab_update.c    | 126 ++++++++++++++++++
 .../testing/selftests/bpf/progs/htab_update.c |  29 ++++
 3 files changed, 156 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/htab_update.c
 create mode 100644 tools/testing/selftests/bpf/progs/htab_update.c

diff --git a/tools/testing/selftests/bpf/DENYLIST.s390x b/tools/testing/selftests/bpf/DENYLIST.s390x
index 736b65f61022..ba02b559ca68 100644
--- a/tools/testing/selftests/bpf/DENYLIST.s390x
+++ b/tools/testing/selftests/bpf/DENYLIST.s390x
@@ -68,3 +68,4 @@ unpriv_bpf_disabled                      # fentry
 setget_sockopt                           # attach unexpected error: -524                                               (trampoline)
 cb_refs                                  # expected error message unexpected error: -524                               (trampoline)
 cgroup_hierarchical_stats                # JIT does not support calling kernel function                                (kfunc)
+htab_update                              # failed to attach: ERROR: strerror_r(-524)=22                                (trampoline)
diff --git a/tools/testing/selftests/bpf/prog_tests/htab_update.c b/tools/testing/selftests/bpf/prog_tests/htab_update.c
new file mode 100644
index 000000000000..2bc85f4814f4
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/htab_update.c
@@ -0,0 +1,126 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2022. Huawei Technologies Co., Ltd */
+#define _GNU_SOURCE
+#include <sched.h>
+#include <stdbool.h>
+#include <test_progs.h>
+#include "htab_update.skel.h"
+
+struct htab_update_ctx {
+	int fd;
+	int loop;
+	bool stop;
+};
+
+static void test_reenter_update(void)
+{
+	struct htab_update *skel;
+	unsigned int key, value;
+	int err;
+
+	skel = htab_update__open();
+	if (!ASSERT_OK_PTR(skel, "htab_update__open"))
+		return;
+
+	/* lookup_elem_raw() may be inlined and find_kernel_btf_id() will return -ESRCH */
+	bpf_program__set_autoload(skel->progs.lookup_elem_raw, true);
+	err = htab_update__load(skel);
+	if (!ASSERT_TRUE(!err || err == -ESRCH, "htab_update__load") || err)
+		goto out;
+
+	skel->bss->pid = getpid();
+	err = htab_update__attach(skel);
+	if (!ASSERT_OK(err, "htab_update__attach"))
+		goto out;
+
+	/* Will trigger the reentrancy of bpf_map_update_elem() */
+	key = 0;
+	value = 0;
+	err = bpf_map_update_elem(bpf_map__fd(skel->maps.htab), &key, &value, 0);
+	if (!ASSERT_OK(err, "add element"))
+		goto out;
+
+	ASSERT_EQ(skel->bss->update_err, -EBUSY, "no reentrancy");
+out:
+	htab_update__destroy(skel);
+}
+
+static void *htab_update_thread(void *arg)
+{
+	struct htab_update_ctx *ctx = arg;
+	cpu_set_t cpus;
+	int i;
+
+	/* Pinned on CPU 0 */
+	CPU_ZERO(&cpus);
+	CPU_SET(0, &cpus);
+	pthread_setaffinity_np(pthread_self(), sizeof(cpus), &cpus);
+
+	i = 0;
+	while (i++ < ctx->loop && !ctx->stop) {
+		unsigned int key = 0, value = 0;
+		int err;
+
+		err = bpf_map_update_elem(ctx->fd, &key, &value, 0);
+		if (err) {
+			ctx->stop = true;
+			return (void *)(long)err;
+		}
+	}
+
+	return NULL;
+}
+
+static void test_concurrent_update(void)
+{
+	struct htab_update_ctx ctx;
+	struct htab_update *skel;
+	unsigned int i, nr;
+	pthread_t *tids;
+	int err;
+
+	skel = htab_update__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "htab_update__open_and_load"))
+		return;
+
+	ctx.fd = bpf_map__fd(skel->maps.htab);
+	ctx.loop = 1000;
+	ctx.stop = false;
+
+	nr = 4;
+	tids = calloc(nr, sizeof(*tids));
+	if (!ASSERT_NEQ(tids, NULL, "no mem"))
+		goto out;
+
+	for (i = 0; i < nr; i++) {
+		err = pthread_create(&tids[i], NULL, htab_update_thread, &ctx);
+		if (!ASSERT_OK(err, "pthread_create")) {
+			unsigned int j;
+
+			ctx.stop = true;
+			for (j = 0; j < i; j++)
+				pthread_join(tids[j], NULL);
+			goto out;
+		}
+	}
+
+	for (i = 0; i < nr; i++) {
+		void *thread_err = NULL;
+
+		pthread_join(tids[i], &thread_err);
+		ASSERT_EQ(thread_err, NULL, "update error");
+	}
+
+out:
+	if (tids)
+		free(tids);
+	htab_update__destroy(skel);
+}
+
+void test_htab_update(void)
+{
+	if (test__start_subtest("reenter_update"))
+		test_reenter_update();
+	if (test__start_subtest("concurrent_update"))
+		test_concurrent_update();
+}
diff --git a/tools/testing/selftests/bpf/progs/htab_update.c b/tools/testing/selftests/bpf/progs/htab_update.c
new file mode 100644
index 000000000000..7481bb30b29b
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/htab_update.c
@@ -0,0 +1,29 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2022. Huawei Technologies Co., Ltd */
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(max_entries, 1);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(__u32));
+} htab SEC(".maps");
+
+int pid = 0;
+int update_err = 0;
+
+SEC("?fentry/lookup_elem_raw")
+int lookup_elem_raw(void *ctx)
+{
+	__u32 key = 0, value = 1;
+
+	if ((bpf_get_current_pid_tgid() >> 32) != pid)
+		return 0;
+
+	update_err = bpf_map_update_elem(&htab, &key, &value, 0);
+	return 0;
+}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf-next v4 0/3] fixes for concurrent htab updates
  2022-08-31  4:26 [PATCH bpf-next v4 0/3] fixes for concurrent htab updates Hou Tao
                   ` (2 preceding siblings ...)
  2022-08-31  4:26 ` [PATCH bpf-next v4 3/3] selftests/bpf: Add test cases for htab update Hou Tao
@ 2022-08-31 21:20 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 5+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-08-31 21:20 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, songliubraving, bigeasy, sunhao.th, haoluo, andrii, yhs,
	ast, daniel, kafai, kpsingh, davem, kuba, sdf, jolsa,
	john.fastabend, oss, houtao1

Hello:

This series was applied to bpf/bpf-next.git (master)
by Martin KaFai Lau <martin.lau@kernel.org>:

On Wed, 31 Aug 2022 12:26:26 +0800 you wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> Hi,
> 
> The patchset aims to fix the issues found during investigating the
> syzkaller problem reported in [0]. It seems that the concurrent updates
> to the same hash-table bucket may fail as shown in patch 1.
> 
> [...]

Here is the summary with links:
  - [bpf-next,v4,1/3] bpf: Disable preemption when increasing per-cpu map_locked
    https://git.kernel.org/bpf/bpf-next/c/2775da216287
  - [bpf-next,v4,2/3] bpf: Propagate error from htab_lock_bucket() to userspace
    https://git.kernel.org/bpf/bpf-next/c/66a7a92e4d0d
  - [bpf-next,v4,3/3] selftests/bpf: Add test cases for htab update
    https://git.kernel.org/bpf/bpf-next/c/1c636b6277a2

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-08-31 21:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-31  4:26 [PATCH bpf-next v4 0/3] fixes for concurrent htab updates Hou Tao
2022-08-31  4:26 ` [PATCH bpf-next v4 1/3] bpf: Disable preemption when increasing per-cpu map_locked Hou Tao
2022-08-31  4:26 ` [PATCH bpf-next v4 2/3] bpf: Propagate error from htab_lock_bucket() to userspace Hou Tao
2022-08-31  4:26 ` [PATCH bpf-next v4 3/3] selftests/bpf: Add test cases for htab update Hou Tao
2022-08-31 21:20 ` [PATCH bpf-next v4 0/3] fixes for concurrent htab updates patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.