* [PATCH bpf-next v2 1/3] bpf: Disable preemption when increasing per-cpu map_locked
@ 2022-08-27 10:01 Hou Tao
2022-08-27 10:01 ` [PATCH bpf-next v2 2/3] bpf: Propagate error from htab_lock_bucket() to userspace Hou Tao
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Hou Tao @ 2022-08-27 10:01 UTC (permalink / raw)
To: bpf, Song Liu, Hao Luo
Cc: Hao Sun, Sebastian Andrzej Siewior, Andrii Nakryiko,
Yonghong Song, Alexei Starovoitov, Daniel Borkmann,
Martin KaFai Lau, KP Singh, David S . Miller, Jakub Kicinski,
Stanislav Fomichev, Jiri Olsa, John Fastabend, Lorenz Bauer,
houtao1
From: Hou Tao <houtao1@huawei.com>
Per-cpu htab->map_locked is used to prohibit the concurrent accesses
from both NMI and non-NMI contexts. But since commit 74d862b682f5
("sched: Make migrate_disable/enable() independent of RT"),
migrations_disable() is also preemptible under CONFIG_PREEMPT case,
so now map_locked also disallows concurrent updates from normal contexts
(e.g. userspace processes) unexpectedly as shown below:
process A process B
htab_map_update_elem()
htab_lock_bucket()
migrate_disable()
/* return 1 */
__this_cpu_inc_return()
/* preempted by B */
htab_map_update_elem()
/* the same bucket as A */
htab_lock_bucket()
migrate_disable()
/* return 2, so lock fails */
__this_cpu_inc_return()
return -EBUSY
A fix that seems feasible is using in_nmi() in htab_lock_bucket() and
only checking the value of map_locked for nmi context. But it will
re-introduce dead-lock on bucket lock if htab_lock_bucket() is re-entered
through non-tracing program (e.g. fentry program).
So fixing it by using disable_preempt() instead of migrate_disable() when
increasing htab->map_locked. However when htab_use_raw_lock() is false,
bucket lock will be a sleepable spin-lock and it breaks disable_preempt(),
so still use migrate_disable() for spin-lock case and leave the
concurrent map updates problem to BPF memory allocator patchset in which
!htab_use_raw_lock() case will be removed.
Reviewed-by: Hao Luo <haoluo@google.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
kernel/bpf/hashtab.c | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index b301a63afa2f..6fb3b7fd1622 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -162,17 +162,25 @@ static inline int htab_lock_bucket(const struct bpf_htab *htab,
unsigned long *pflags)
{
unsigned long flags;
+ bool use_raw_lock;
hash = hash & HASHTAB_MAP_LOCK_MASK;
- migrate_disable();
+ use_raw_lock = htab_use_raw_lock(htab);
+ if (use_raw_lock)
+ preempt_disable();
+ else
+ migrate_disable();
if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) {
__this_cpu_dec(*(htab->map_locked[hash]));
- migrate_enable();
+ if (use_raw_lock)
+ preempt_enable();
+ else
+ migrate_enable();
return -EBUSY;
}
- if (htab_use_raw_lock(htab))
+ if (use_raw_lock)
raw_spin_lock_irqsave(&b->raw_lock, flags);
else
spin_lock_irqsave(&b->lock, flags);
@@ -185,13 +193,18 @@ static inline void htab_unlock_bucket(const struct bpf_htab *htab,
struct bucket *b, u32 hash,
unsigned long flags)
{
+ bool use_raw_lock = htab_use_raw_lock(htab);
+
hash = hash & HASHTAB_MAP_LOCK_MASK;
- if (htab_use_raw_lock(htab))
+ if (use_raw_lock)
raw_spin_unlock_irqrestore(&b->raw_lock, flags);
else
spin_unlock_irqrestore(&b->lock, flags);
__this_cpu_dec(*(htab->map_locked[hash]));
- migrate_enable();
+ if (use_raw_lock)
+ preempt_enable();
+ else
+ migrate_enable();
}
static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node);
--
2.29.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH bpf-next v2 2/3] bpf: Propagate error from htab_lock_bucket() to userspace
2022-08-27 10:01 [PATCH bpf-next v2 1/3] bpf: Disable preemption when increasing per-cpu map_locked Hou Tao
@ 2022-08-27 10:01 ` Hou Tao
2022-08-28 0:24 ` KP Singh
2022-08-27 10:01 ` [PATCH bpf-next v2 3/3] selftests/bpf: add test cases for htab update Hou Tao
2022-08-28 22:39 ` [PATCH bpf-next v2 1/3] bpf: Disable preemption when increasing per-cpu map_locked KP Singh
2 siblings, 1 reply; 9+ messages in thread
From: Hou Tao @ 2022-08-27 10:01 UTC (permalink / raw)
To: bpf, Song Liu, Hao Luo
Cc: Hao Sun, Sebastian Andrzej Siewior, Andrii Nakryiko,
Yonghong Song, Alexei Starovoitov, Daniel Borkmann,
Martin KaFai Lau, KP Singh, David S . Miller, Jakub Kicinski,
Stanislav Fomichev, Jiri Olsa, John Fastabend, Lorenz Bauer,
houtao1
From: Hou Tao <houtao1@huawei.com>
In __htab_map_lookup_and_delete_batch() if htab_lock_bucket() returns
-EBUSY, it will go to next bucket. Going to next bucket may not only
skip the elements in current bucket silently, but also incur
out-of-bound memory access or expose kernel memory to userspace if
current bucket_cnt is greater than bucket_size or zero.
Fixing it by stopping batch operation and returning -EBUSY when
htab_lock_bucket() fails, and the application can retry or skip the busy
batch as needed.
Reported-by: Hao Sun <sunhao.th@gmail.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
kernel/bpf/hashtab.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 6fb3b7fd1622..eb1263f03e9b 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -1704,8 +1704,11 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
/* do not grab the lock unless need it (bucket_cnt > 0). */
if (locked) {
ret = htab_lock_bucket(htab, b, batch, &flags);
- if (ret)
- goto next_batch;
+ if (ret) {
+ rcu_read_unlock();
+ bpf_enable_instrumentation();
+ goto after_loop;
+ }
}
bucket_cnt = 0;
--
2.29.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH bpf-next v2 3/3] selftests/bpf: add test cases for htab update
2022-08-27 10:01 [PATCH bpf-next v2 1/3] bpf: Disable preemption when increasing per-cpu map_locked Hou Tao
2022-08-27 10:01 ` [PATCH bpf-next v2 2/3] bpf: Propagate error from htab_lock_bucket() to userspace Hou Tao
@ 2022-08-27 10:01 ` Hou Tao
2022-08-28 22:39 ` KP Singh
2022-08-28 22:39 ` [PATCH bpf-next v2 1/3] bpf: Disable preemption when increasing per-cpu map_locked KP Singh
2 siblings, 1 reply; 9+ messages in thread
From: Hou Tao @ 2022-08-27 10:01 UTC (permalink / raw)
To: bpf, Song Liu, Hao Luo
Cc: Hao Sun, Sebastian Andrzej Siewior, Andrii Nakryiko,
Yonghong Song, Alexei Starovoitov, Daniel Borkmann,
Martin KaFai Lau, KP Singh, David S . Miller, Jakub Kicinski,
Stanislav Fomichev, Jiri Olsa, John Fastabend, Lorenz Bauer,
houtao1
From: Hou Tao <houtao1@huawei.com>
One test demonstrates the reentrancy of hash map update fails, and
another one shows concureently updates of the same hash map bucket
succeed.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
.../selftests/bpf/prog_tests/htab_update.c | 126 ++++++++++++++++++
.../testing/selftests/bpf/progs/htab_update.c | 29 ++++
2 files changed, 155 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/htab_update.c
create mode 100644 tools/testing/selftests/bpf/progs/htab_update.c
diff --git a/tools/testing/selftests/bpf/prog_tests/htab_update.c b/tools/testing/selftests/bpf/prog_tests/htab_update.c
new file mode 100644
index 000000000000..e2a4034daa79
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/htab_update.c
@@ -0,0 +1,126 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2022. Huawei Technologies Co., Ltd */
+#define _GNU_SOURCE
+#include <sched.h>
+#include <stdbool.h>
+#include <test_progs.h>
+#include "htab_update.skel.h"
+
+struct htab_update_ctx {
+ int fd;
+ int loop;
+ bool stop;
+};
+
+static void test_reenter_update(void)
+{
+ struct htab_update *skel;
+ unsigned int key, value;
+ int err;
+
+ skel = htab_update__open();
+ if (!ASSERT_OK_PTR(skel, "htab_update__open"))
+ return;
+
+ /* lookup_elem_raw() may be inlined and find_kernel_btf_id() will return -ESRCH */
+ bpf_program__set_autoload(skel->progs.lookup_elem_raw, true);
+ err = htab_update__load(skel);
+ if (!ASSERT_TRUE(!err || err == -ESRCH, "htab_update__load") || err)
+ goto out;
+
+ skel->bss->pid = getpid();
+ err = htab_update__attach(skel);
+ if (!ASSERT_OK(err, "htab_update__attach"))
+ goto out;
+
+ /* Will trigger the reentrancy of bpf_map_update_elem() */
+ key = 0;
+ value = 0;
+ err = bpf_map_update_elem(bpf_map__fd(skel->maps.htab), &key, &value, 0);
+ if (!ASSERT_OK(err, "add element"))
+ goto out;
+
+ ASSERT_EQ(skel->bss->update_err, -EBUSY, "no reentrancy");
+out:
+ htab_update__destroy(skel);
+}
+
+static void *htab_update_thread(void *arg)
+{
+ struct htab_update_ctx *ctx = arg;
+ cpu_set_t cpus;
+ int i;
+
+ /* Pin on CPU 0 */
+ CPU_ZERO(&cpus);
+ CPU_SET(0, &cpus);
+ pthread_setaffinity_np(pthread_self(), sizeof(cpus), &cpus);
+
+ i = 0;
+ while (i++ < ctx->loop && !ctx->stop) {
+ unsigned int key = 0, value = 0;
+ int err;
+
+ err = bpf_map_update_elem(ctx->fd, &key, &value, 0);
+ if (err) {
+ ctx->stop = true;
+ return (void *)(long)err;
+ }
+ }
+
+ return NULL;
+}
+
+static void test_concurrent_update(void)
+{
+ struct htab_update_ctx ctx;
+ struct htab_update *skel;
+ unsigned int i, nr;
+ pthread_t *tids;
+ int err;
+
+ skel = htab_update__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "htab_update__open_and_load"))
+ return;
+
+ ctx.fd = bpf_map__fd(skel->maps.htab);
+ ctx.loop = 1000;
+ ctx.stop = false;
+
+ nr = 4;
+ tids = calloc(nr, sizeof(*tids));
+ if (!ASSERT_NEQ(tids, NULL, "no mem"))
+ goto out;
+
+ for (i = 0; i < nr; i++) {
+ err = pthread_create(&tids[i], NULL, htab_update_thread, &ctx);
+ if (!ASSERT_OK(err, "pthread_create")) {
+ unsigned int j;
+
+ ctx.stop = true;
+ for (j = 0; j < i; j++)
+ pthread_join(tids[j], NULL);
+ goto out;
+ }
+ }
+
+ for (i = 0; i < nr; i++) {
+ void *thread_err = NULL;
+
+ pthread_join(tids[i], &thread_err);
+ ASSERT_EQ(thread_err, NULL, "update error");
+ }
+
+out:
+ if (tids)
+ free(tids);
+ htab_update__destroy(skel);
+}
+
+void test_htab_update(void)
+{
+ if (test__start_subtest("reenter_update"))
+ test_reenter_update();
+ if (test__start_subtest("concurrent_update"))
+ test_concurrent_update();
+}
diff --git a/tools/testing/selftests/bpf/progs/htab_update.c b/tools/testing/selftests/bpf/progs/htab_update.c
new file mode 100644
index 000000000000..7481bb30b29b
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/htab_update.c
@@ -0,0 +1,29 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2022. Huawei Technologies Co., Ltd */
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(max_entries, 1);
+ __uint(key_size, sizeof(__u32));
+ __uint(value_size, sizeof(__u32));
+} htab SEC(".maps");
+
+int pid = 0;
+int update_err = 0;
+
+SEC("?fentry/lookup_elem_raw")
+int lookup_elem_raw(void *ctx)
+{
+ __u32 key = 0, value = 1;
+
+ if ((bpf_get_current_pid_tgid() >> 32) != pid)
+ return 0;
+
+ update_err = bpf_map_update_elem(&htab, &key, &value, 0);
+ return 0;
+}
--
2.29.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH bpf-next v2 2/3] bpf: Propagate error from htab_lock_bucket() to userspace
2022-08-27 10:01 ` [PATCH bpf-next v2 2/3] bpf: Propagate error from htab_lock_bucket() to userspace Hou Tao
@ 2022-08-28 0:24 ` KP Singh
2022-08-29 1:19 ` Hou Tao
0 siblings, 1 reply; 9+ messages in thread
From: KP Singh @ 2022-08-28 0:24 UTC (permalink / raw)
To: Hou Tao
Cc: bpf, Song Liu, Hao Luo, Hao Sun, Sebastian Andrzej Siewior,
Andrii Nakryiko, Yonghong Song, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, David S . Miller,
Jakub Kicinski, Stanislav Fomichev, Jiri Olsa, John Fastabend,
Lorenz Bauer, houtao1
On Sat, Aug 27, 2022 at 11:43 AM Hou Tao <houtao@huaweicloud.com> wrote:
>
> From: Hou Tao <houtao1@huawei.com>
>
> In __htab_map_lookup_and_delete_batch() if htab_lock_bucket() returns
> -EBUSY, it will go to next bucket. Going to next bucket may not only
> skip the elements in current bucket silently, but also incur
> out-of-bound memory access or expose kernel memory to userspace if
> current bucket_cnt is greater than bucket_size or zero.
>
> Fixing it by stopping batch operation and returning -EBUSY when
> htab_lock_bucket() fails, and the application can retry or skip the busy
> batch as needed.
>
> Reported-by: Hao Sun <sunhao.th@gmail.com>
> Signed-off-by: Hou Tao <houtao1@huawei.com>
Please add a Fixes tag here
> ---
> kernel/bpf/hashtab.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 6fb3b7fd1622..eb1263f03e9b 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -1704,8 +1704,11 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
> /* do not grab the lock unless need it (bucket_cnt > 0). */
> if (locked) {
> ret = htab_lock_bucket(htab, b, batch, &flags);
> - if (ret)
> - goto next_batch;
> + if (ret) {
> + rcu_read_unlock();
> + bpf_enable_instrumentation();
> + goto after_loop;
> + }
> }
>
> bucket_cnt = 0;
> --
> 2.29.2
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf-next v2 3/3] selftests/bpf: add test cases for htab update
2022-08-27 10:01 ` [PATCH bpf-next v2 3/3] selftests/bpf: add test cases for htab update Hou Tao
@ 2022-08-28 22:39 ` KP Singh
2022-08-29 1:12 ` Hou Tao
0 siblings, 1 reply; 9+ messages in thread
From: KP Singh @ 2022-08-28 22:39 UTC (permalink / raw)
To: Hou Tao
Cc: bpf, Song Liu, Hao Luo, Hao Sun, Sebastian Andrzej Siewior,
Andrii Nakryiko, Yonghong Song, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, David S . Miller,
Jakub Kicinski, Stanislav Fomichev, Jiri Olsa, John Fastabend,
Lorenz Bauer, houtao1
On Sat, Aug 27, 2022 at 11:43 AM Hou Tao <houtao@huaweicloud.com> wrote:
>
> From: Hou Tao <houtao1@huawei.com>
>
> One test demonstrates the reentrancy of hash map update fails, and
> another one shows concureently updates of the same hash map bucket
"concurrent updates of the same hashmap"
This is just a description of what the test does but not why?
What's the expected behaviour? Was it broken?
I think your whole series will benefit from a cover letter to explain
the stuff you are fixing.
> succeed.
>
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
> .../selftests/bpf/prog_tests/htab_update.c | 126 ++++++++++++++++++
> .../testing/selftests/bpf/progs/htab_update.c | 29 ++++
> 2 files changed, 155 insertions(+)
> create mode 100644 tools/testing/selftests/bpf/prog_tests/htab_update.c
> create mode 100644 tools/testing/selftests/bpf/progs/htab_update.c
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/htab_update.c b/tools/testing/selftests/bpf/prog_tests/htab_update.c
> new file mode 100644
> index 000000000000..e2a4034daa79
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/htab_update.c
> @@ -0,0 +1,126 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (C) 2022. Huawei Technologies Co., Ltd */
> +#define _GNU_SOURCE
> +#include <sched.h>
> +#include <stdbool.h>
> +#include <test_progs.h>
> +#include "htab_update.skel.h"
> +
> +struct htab_update_ctx {
> + int fd;
> + int loop;
> + bool stop;
> +};
> +
> +static void test_reenter_update(void)
> +{
> + struct htab_update *skel;
> + unsigned int key, value;
> + int err;
> +
> + skel = htab_update__open();
> + if (!ASSERT_OK_PTR(skel, "htab_update__open"))
> + return;
> +
> + /* lookup_elem_raw() may be inlined and find_kernel_btf_id() will return -ESRCH */
> + bpf_program__set_autoload(skel->progs.lookup_elem_raw, true);
> + err = htab_update__load(skel);
> + if (!ASSERT_TRUE(!err || err == -ESRCH, "htab_update__load") || err)
> + goto out;
> +
> + skel->bss->pid = getpid();
> + err = htab_update__attach(skel);
> + if (!ASSERT_OK(err, "htab_update__attach"))
> + goto out;
> +
> + /* Will trigger the reentrancy of bpf_map_update_elem() */
> + key = 0;
> + value = 0;
nit: just move these initializations to the top.
> + err = bpf_map_update_elem(bpf_map__fd(skel->maps.htab), &key, &value, 0);
> + if (!ASSERT_OK(err, "add element"))
> + goto out;
> +
> + ASSERT_EQ(skel->bss->update_err, -EBUSY, "no reentrancy");
> +out:
> + htab_update__destroy(skel);
> +}
> +
> +static void *htab_update_thread(void *arg)
> +{
> + struct htab_update_ctx *ctx = arg;
> + cpu_set_t cpus;
> + int i;
> +
> + /* Pin on CPU 0 */
> + CPU_ZERO(&cpus);
> + CPU_SET(0, &cpus);
> + pthread_setaffinity_np(pthread_self(), sizeof(cpus), &cpus);
> +
> + i = 0;
> + while (i++ < ctx->loop && !ctx->stop) {
nit: for loop?
> + unsigned int key = 0, value = 0;
> + int err;
> +
> + err = bpf_map_update_elem(ctx->fd, &key, &value, 0);
> + if (err) {
> + ctx->stop = true;
> + return (void *)(long)err;
> + }
> + }
> +
> + return NULL;
> +}
> +
> +static void test_concurrent_update(void)
> +{
> + struct htab_update_ctx ctx;
> + struct htab_update *skel;
> + unsigned int i, nr;
> + pthread_t *tids;
> + int err;
> +
> + skel = htab_update__open_and_load();
> + if (!ASSERT_OK_PTR(skel, "htab_update__open_and_load"))
> + return;
> +
> + ctx.fd = bpf_map__fd(skel->maps.htab);
> + ctx.loop = 1000;
> + ctx.stop = false;
> +
> + nr = 4;
> + tids = calloc(nr, sizeof(*tids));
> + if (!ASSERT_NEQ(tids, NULL, "no mem"))
> + goto out;
> +
> + for (i = 0; i < nr; i++) {
> + err = pthread_create(&tids[i], NULL, htab_update_thread, &ctx);
> + if (!ASSERT_OK(err, "pthread_create")) {
> + unsigned int j;
> +
> + ctx.stop = true;
> + for (j = 0; j < i; j++)
> + pthread_join(tids[j], NULL);
> + goto out;
> + }
> + }
> +
> + for (i = 0; i < nr; i++) {
> + void *thread_err = NULL;
> +
> + pthread_join(tids[i], &thread_err);
> + ASSERT_EQ(thread_err, NULL, "update error");
> + }
> +
> +out:
> + if (tids)
> + free(tids);
> + htab_update__destroy(skel);
> +}
> +
> +void test_htab_update(void)
> +{
> + if (test__start_subtest("reenter_update"))
> + test_reenter_update();
> + if (test__start_subtest("concurrent_update"))
> + test_concurrent_update();
> +}
> diff --git a/tools/testing/selftests/bpf/progs/htab_update.c b/tools/testing/selftests/bpf/progs/htab_update.c
> new file mode 100644
> index 000000000000..7481bb30b29b
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/htab_update.c
> @@ -0,0 +1,29 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (C) 2022. Huawei Technologies Co., Ltd */
> +#include <linux/bpf.h>
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_tracing.h>
> +
> +char _license[] SEC("license") = "GPL";
> +
> +struct {
> + __uint(type, BPF_MAP_TYPE_HASH);
> + __uint(max_entries, 1);
> + __uint(key_size, sizeof(__u32));
> + __uint(value_size, sizeof(__u32));
> +} htab SEC(".maps");
> +
> +int pid = 0;
> +int update_err = 0;
> +
> +SEC("?fentry/lookup_elem_raw")
> +int lookup_elem_raw(void *ctx)
> +{
> + __u32 key = 0, value = 1;
> +
> + if ((bpf_get_current_pid_tgid() >> 32) != pid)
> + return 0;
> +
> + update_err = bpf_map_update_elem(&htab, &key, &value, 0);
> + return 0;
> +}
> --
> 2.29.2
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf-next v2 1/3] bpf: Disable preemption when increasing per-cpu map_locked
2022-08-27 10:01 [PATCH bpf-next v2 1/3] bpf: Disable preemption when increasing per-cpu map_locked Hou Tao
2022-08-27 10:01 ` [PATCH bpf-next v2 2/3] bpf: Propagate error from htab_lock_bucket() to userspace Hou Tao
2022-08-27 10:01 ` [PATCH bpf-next v2 3/3] selftests/bpf: add test cases for htab update Hou Tao
@ 2022-08-28 22:39 ` KP Singh
2022-08-29 1:16 ` Hou Tao
2 siblings, 1 reply; 9+ messages in thread
From: KP Singh @ 2022-08-28 22:39 UTC (permalink / raw)
To: Hou Tao
Cc: bpf, Song Liu, Hao Luo, Hao Sun, Sebastian Andrzej Siewior,
Andrii Nakryiko, Yonghong Song, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, David S . Miller,
Jakub Kicinski, Stanislav Fomichev, Jiri Olsa, John Fastabend,
Lorenz Bauer, houtao1
On Sat, Aug 27, 2022 at 11:43 AM Hou Tao <houtao@huaweicloud.com> wrote:
>
> From: Hou Tao <houtao1@huawei.com>
>
> Per-cpu htab->map_locked is used to prohibit the concurrent accesses
> from both NMI and non-NMI contexts. But since commit 74d862b682f5
> ("sched: Make migrate_disable/enable() independent of RT"),
> migrations_disable() is also preemptible under CONFIG_PREEMPT case,
nit: migrate_disable
> so now map_locked also disallows concurrent updates from normal contexts
> (e.g. userspace processes) unexpectedly as shown below:
>
> process A process B
>
> htab_map_update_elem()
> htab_lock_bucket()
> migrate_disable()
> /* return 1 */
> __this_cpu_inc_return()
> /* preempted by B */
>
> htab_map_update_elem()
> /* the same bucket as A */
> htab_lock_bucket()
> migrate_disable()
> /* return 2, so lock fails */
> __this_cpu_inc_return()
> return -EBUSY
>
> A fix that seems feasible is using in_nmi() in htab_lock_bucket() and
> only checking the value of map_locked for nmi context. But it will
> re-introduce dead-lock on bucket lock if htab_lock_bucket() is re-entered
> through non-tracing program (e.g. fentry program).
>
> So fixing it by using disable_preempt() instead of migrate_disable() when
> increasing htab->map_locked. However when htab_use_raw_lock() is false,
> bucket lock will be a sleepable spin-lock and it breaks disable_preempt(),
> so still use migrate_disable() for spin-lock case and leave the
> concurrent map updates problem to BPF memory allocator patchset in which
> !htab_use_raw_lock() case will be removed.
I think the description needs a bit more clarity and I think you mean
preempt_disable() instead of disable_preempt
Suggestion:
One cannot use preempt_disable() to fix this issue as htab_use_raw_lock
being false causes the bucket lock to be a spin lock which can sleep and
does not work with preempt_disable().
Therefore, use migrate_disable() when using the spinlock instead of
preempt_disable() and defer fixing concurrent updates to when the kernel
has its own BPF memory allocator.
>
> Reviewed-by: Hao Luo <haoluo@google.com>
> Signed-off-by: Hou Tao <houtao1@huawei.com>
Fixes tag here please?
> ---
> kernel/bpf/hashtab.c | 23 ++++++++++++++++++-----
> 1 file changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index b301a63afa2f..6fb3b7fd1622 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -162,17 +162,25 @@ static inline int htab_lock_bucket(const struct bpf_htab *htab,
> unsigned long *pflags)
> {
> unsigned long flags;
> + bool use_raw_lock;
>
> hash = hash & HASHTAB_MAP_LOCK_MASK;
>
> - migrate_disable();
> + use_raw_lock = htab_use_raw_lock(htab);
> + if (use_raw_lock)
> + preempt_disable();
> + else
> + migrate_disable();
> if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) {
> __this_cpu_dec(*(htab->map_locked[hash]));
> - migrate_enable();
> + if (use_raw_lock)
> + preempt_enable();
> + else
> + migrate_enable();
> return -EBUSY;
> }
>
> - if (htab_use_raw_lock(htab))
> + if (use_raw_lock)
> raw_spin_lock_irqsave(&b->raw_lock, flags);
> else
> spin_lock_irqsave(&b->lock, flags);
> @@ -185,13 +193,18 @@ static inline void htab_unlock_bucket(const struct bpf_htab *htab,
> struct bucket *b, u32 hash,
> unsigned long flags)
> {
> + bool use_raw_lock = htab_use_raw_lock(htab);
> +
> hash = hash & HASHTAB_MAP_LOCK_MASK;
> - if (htab_use_raw_lock(htab))
> + if (use_raw_lock)
> raw_spin_unlock_irqrestore(&b->raw_lock, flags);
> else
> spin_unlock_irqrestore(&b->lock, flags);
> __this_cpu_dec(*(htab->map_locked[hash]));
> - migrate_enable();
> + if (use_raw_lock)
> + preempt_enable();
> + else
> + migrate_enable();
> }
>
> static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node);
> --
> 2.29.2
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf-next v2 3/3] selftests/bpf: add test cases for htab update
2022-08-28 22:39 ` KP Singh
@ 2022-08-29 1:12 ` Hou Tao
0 siblings, 0 replies; 9+ messages in thread
From: Hou Tao @ 2022-08-29 1:12 UTC (permalink / raw)
To: KP Singh
Cc: bpf, Song Liu, Hao Luo, Hao Sun, Sebastian Andrzej Siewior,
Andrii Nakryiko, Yonghong Song, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, David S . Miller,
Jakub Kicinski, Stanislav Fomichev, Jiri Olsa, John Fastabend,
Lorenz Bauer, houtao1
Hi,
On 8/29/2022 6:39 AM, KP Singh wrote:
> On Sat, Aug 27, 2022 at 11:43 AM Hou Tao <houtao@huaweicloud.com> wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> One test demonstrates the reentrancy of hash map update fails, and
>> another one shows concureently updates of the same hash map bucket
> "concurrent updates of the same hashmap"
>
> This is just a description of what the test does but not why?
Will elaborate the description.
>
> What's the expected behaviour? Was it broken?
>
> I think your whole series will benefit from a cover letter to explain
> the stuff you are fixing.
Yes. There is a patch 0 for v2, but I forge to send it out.
>
>> succeed.
>>
>> Signed-off-by: Hou Tao <houtao1@huawei.com>
>> ---
>> .../selftests/bpf/prog_tests/htab_update.c | 126 ++++++++++++++++++
>> .../testing/selftests/bpf/progs/htab_update.c | 29 ++++
>> 2 files changed, 155 insertions(+)
>> create mode 100644 tools/testing/selftests/bpf/prog_tests/htab_update.c
>> create mode 100644 tools/testing/selftests/bpf/progs/htab_update.c
>>
>> diff --git a/tools/testing/selftests/bpf/prog_tests/htab_update.c b/tools/testing/selftests/bpf/prog_tests/htab_update.c
>> new file mode 100644
>> index 000000000000..e2a4034daa79
>> --- /dev/null
>> +++ b/tools/testing/selftests/bpf/prog_tests/htab_update.c
>> @@ -0,0 +1,126 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/* Copyright (C) 2022. Huawei Technologies Co., Ltd */
>> +#define _GNU_SOURCE
>> +#include <sched.h>
>> +#include <stdbool.h>
>> +#include <test_progs.h>
>> +#include "htab_update.skel.h"
>> +
>> +struct htab_update_ctx {
>> + int fd;
>> + int loop;
>> + bool stop;
>> +};
>> +
>> +static void test_reenter_update(void)
>> +{
>> + struct htab_update *skel;
>> + unsigned int key, value;
>> + int err;
>> +
>> + skel = htab_update__open();
>> + if (!ASSERT_OK_PTR(skel, "htab_update__open"))
>> + return;
>> +
>> + /* lookup_elem_raw() may be inlined and find_kernel_btf_id() will return -ESRCH */
>> + bpf_program__set_autoload(skel->progs.lookup_elem_raw, true);
>> + err = htab_update__load(skel);
>> + if (!ASSERT_TRUE(!err || err == -ESRCH, "htab_update__load") || err)
>> + goto out;
>> +
>> + skel->bss->pid = getpid();
>> + err = htab_update__attach(skel);
>> + if (!ASSERT_OK(err, "htab_update__attach"))
>> + goto out;
>> +
>> + /* Will trigger the reentrancy of bpf_map_update_elem() */
>> + key = 0;
>> + value = 0;
> nit: just move these initializations to the top.
>
>> + err = bpf_map_update_elem(bpf_map__fd(skel->maps.htab), &key, &value, 0);
>> + if (!ASSERT_OK(err, "add element"))
>> + goto out;
>> +
>> + ASSERT_EQ(skel->bss->update_err, -EBUSY, "no reentrancy");
>> +out:
>> + htab_update__destroy(skel);
>> +}
>> +
>> +static void *htab_update_thread(void *arg)
>> +{
>> + struct htab_update_ctx *ctx = arg;
>> + cpu_set_t cpus;
>> + int i;
>> +
>> + /* Pin on CPU 0 */
>> + CPU_ZERO(&cpus);
>> + CPU_SET(0, &cpus);
>> + pthread_setaffinity_np(pthread_self(), sizeof(cpus), &cpus);
>> +
>> + i = 0;
>> + while (i++ < ctx->loop && !ctx->stop) {
> nit: for loop?
>
>
>> + unsigned int key = 0, value = 0;
>> + int err;
>> +
>> + err = bpf_map_update_elem(ctx->fd, &key, &value, 0);
>> + if (err) {
>> + ctx->stop = true;
>> + return (void *)(long)err;
>> + }
>> + }
>> +
>> + return NULL;
>> +}
>> +
>> +static void test_concurrent_update(void)
>> +{
>> + struct htab_update_ctx ctx;
>> + struct htab_update *skel;
>> + unsigned int i, nr;
>> + pthread_t *tids;
>> + int err;
>> +
>> + skel = htab_update__open_and_load();
>> + if (!ASSERT_OK_PTR(skel, "htab_update__open_and_load"))
>> + return;
>> +
>> + ctx.fd = bpf_map__fd(skel->maps.htab);
>> + ctx.loop = 1000;
>> + ctx.stop = false;
>> +
>> + nr = 4;
>> + tids = calloc(nr, sizeof(*tids));
>> + if (!ASSERT_NEQ(tids, NULL, "no mem"))
>> + goto out;
>> +
>> + for (i = 0; i < nr; i++) {
>> + err = pthread_create(&tids[i], NULL, htab_update_thread, &ctx);
>> + if (!ASSERT_OK(err, "pthread_create")) {
>> + unsigned int j;
>> +
>> + ctx.stop = true;
>> + for (j = 0; j < i; j++)
>> + pthread_join(tids[j], NULL);
>> + goto out;
>> + }
>> + }
>> +
>> + for (i = 0; i < nr; i++) {
>> + void *thread_err = NULL;
>> +
>> + pthread_join(tids[i], &thread_err);
>> + ASSERT_EQ(thread_err, NULL, "update error");
>> + }
>> +
>> +out:
>> + if (tids)
>> + free(tids);
>> + htab_update__destroy(skel);
>> +}
>> +
>> +void test_htab_update(void)
>> +{
>> + if (test__start_subtest("reenter_update"))
>> + test_reenter_update();
>> + if (test__start_subtest("concurrent_update"))
>> + test_concurrent_update();
>> +}
>> diff --git a/tools/testing/selftests/bpf/progs/htab_update.c b/tools/testing/selftests/bpf/progs/htab_update.c
>> new file mode 100644
>> index 000000000000..7481bb30b29b
>> --- /dev/null
>> +++ b/tools/testing/selftests/bpf/progs/htab_update.c
>> @@ -0,0 +1,29 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/* Copyright (C) 2022. Huawei Technologies Co., Ltd */
>> +#include <linux/bpf.h>
>> +#include <bpf/bpf_helpers.h>
>> +#include <bpf/bpf_tracing.h>
>> +
>> +char _license[] SEC("license") = "GPL";
>> +
>> +struct {
>> + __uint(type, BPF_MAP_TYPE_HASH);
>> + __uint(max_entries, 1);
>> + __uint(key_size, sizeof(__u32));
>> + __uint(value_size, sizeof(__u32));
>> +} htab SEC(".maps");
>> +
>> +int pid = 0;
>> +int update_err = 0;
>> +
>> +SEC("?fentry/lookup_elem_raw")
>> +int lookup_elem_raw(void *ctx)
>> +{
>> + __u32 key = 0, value = 1;
>> +
>> + if ((bpf_get_current_pid_tgid() >> 32) != pid)
>> + return 0;
>> +
>> + update_err = bpf_map_update_elem(&htab, &key, &value, 0);
>> + return 0;
>> +}
>> --
>> 2.29.2
>>
> .
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf-next v2 1/3] bpf: Disable preemption when increasing per-cpu map_locked
2022-08-28 22:39 ` [PATCH bpf-next v2 1/3] bpf: Disable preemption when increasing per-cpu map_locked KP Singh
@ 2022-08-29 1:16 ` Hou Tao
0 siblings, 0 replies; 9+ messages in thread
From: Hou Tao @ 2022-08-29 1:16 UTC (permalink / raw)
To: KP Singh
Cc: bpf, Song Liu, Hao Luo, Hao Sun, Sebastian Andrzej Siewior,
Andrii Nakryiko, Yonghong Song, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, David S . Miller,
Jakub Kicinski, Stanislav Fomichev, Jiri Olsa, John Fastabend,
Lorenz Bauer, houtao1
On 8/29/2022 6:39 AM, KP Singh wrote:
> On Sat, Aug 27, 2022 at 11:43 AM Hou Tao <houtao@huaweicloud.com> wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> Per-cpu htab->map_locked is used to prohibit the concurrent accesses
>> from both NMI and non-NMI contexts. But since commit 74d862b682f5
>> ("sched: Make migrate_disable/enable() independent of RT"),
>> migrations_disable() is also preemptible under CONFIG_PREEMPT case,
> nit: migrate_disable
Will fix in v3.
>
>> so now map_locked also disallows concurrent updates from normal contexts
>> (e.g. userspace processes) unexpectedly as shown below:
>>
>> process A process B
>>
>> htab_map_update_elem()
>> htab_lock_bucket()
>> migrate_disable()
>> /* return 1 */
>> __this_cpu_inc_return()
>> /* preempted by B */
>>
>> htab_map_update_elem()
>> /* the same bucket as A */
>> htab_lock_bucket()
>> migrate_disable()
>> /* return 2, so lock fails */
>> __this_cpu_inc_return()
>> return -EBUSY
>>
>> A fix that seems feasible is using in_nmi() in htab_lock_bucket() and
>> only checking the value of map_locked for nmi context. But it will
>> re-introduce dead-lock on bucket lock if htab_lock_bucket() is re-entered
>> through non-tracing program (e.g. fentry program).
>>
>> So fixing it by using disable_preempt() instead of migrate_disable() when
>> increasing htab->map_locked. However when htab_use_raw_lock() is false,
>> bucket lock will be a sleepable spin-lock and it breaks disable_preempt(),
>> so still use migrate_disable() for spin-lock case and leave the
>> concurrent map updates problem to BPF memory allocator patchset in which
>> !htab_use_raw_lock() case will be removed.
> I think the description needs a bit more clarity and I think you mean
> preempt_disable() instead of disable_preempt
>
> Suggestion:
>
> One cannot use preempt_disable() to fix this issue as htab_use_raw_lock
> being false causes the bucket lock to be a spin lock which can sleep and
> does not work with preempt_disable().
>
> Therefore, use migrate_disable() when using the spinlock instead of
> preempt_disable() and defer fixing concurrent updates to when the kernel
> has its own BPF memory allocator.
Will update the commit message in v3 and thanks for the suggestion.
>
>> Reviewed-by: Hao Luo <haoluo@google.com>
>> Signed-off-by: Hou Tao <houtao1@huawei.com>
> Fixes tag here please?
Will add "Fixes: 74d862b682f5 ("sched: Make migrate_disable/enable() independent
of RT")" in v3.
>
>
>> ---
>> kernel/bpf/hashtab.c | 23 ++++++++++++++++++-----
>> 1 file changed, 18 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>> index b301a63afa2f..6fb3b7fd1622 100644
>> --- a/kernel/bpf/hashtab.c
>> +++ b/kernel/bpf/hashtab.c
>> @@ -162,17 +162,25 @@ static inline int htab_lock_bucket(const struct bpf_htab *htab,
>> unsigned long *pflags)
>> {
>> unsigned long flags;
>> + bool use_raw_lock;
>>
>> hash = hash & HASHTAB_MAP_LOCK_MASK;
>>
>> - migrate_disable();
>> + use_raw_lock = htab_use_raw_lock(htab);
>> + if (use_raw_lock)
>> + preempt_disable();
>> + else
>> + migrate_disable();
>> if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) {
>> __this_cpu_dec(*(htab->map_locked[hash]));
>> - migrate_enable();
>> + if (use_raw_lock)
>> + preempt_enable();
>> + else
>> + migrate_enable();
>> return -EBUSY;
>> }
>>
>> - if (htab_use_raw_lock(htab))
>> + if (use_raw_lock)
>> raw_spin_lock_irqsave(&b->raw_lock, flags);
>> else
>> spin_lock_irqsave(&b->lock, flags);
>> @@ -185,13 +193,18 @@ static inline void htab_unlock_bucket(const struct bpf_htab *htab,
>> struct bucket *b, u32 hash,
>> unsigned long flags)
>> {
>> + bool use_raw_lock = htab_use_raw_lock(htab);
>> +
>> hash = hash & HASHTAB_MAP_LOCK_MASK;
>> - if (htab_use_raw_lock(htab))
>> + if (use_raw_lock)
>> raw_spin_unlock_irqrestore(&b->raw_lock, flags);
>> else
>> spin_unlock_irqrestore(&b->lock, flags);
>> __this_cpu_dec(*(htab->map_locked[hash]));
>> - migrate_enable();
>> + if (use_raw_lock)
>> + preempt_enable();
>> + else
>> + migrate_enable();
>> }
>>
>> static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node);
>> --
>> 2.29.2
>>
> .
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf-next v2 2/3] bpf: Propagate error from htab_lock_bucket() to userspace
2022-08-28 0:24 ` KP Singh
@ 2022-08-29 1:19 ` Hou Tao
0 siblings, 0 replies; 9+ messages in thread
From: Hou Tao @ 2022-08-29 1:19 UTC (permalink / raw)
To: KP Singh
Cc: bpf, Song Liu, Hao Luo, Hao Sun, Sebastian Andrzej Siewior,
Andrii Nakryiko, Yonghong Song, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, David S . Miller,
Jakub Kicinski, Stanislav Fomichev, Jiri Olsa, John Fastabend,
Lorenz Bauer, houtao1
Hi,
On 8/28/2022 8:24 AM, KP Singh wrote:
> On Sat, Aug 27, 2022 at 11:43 AM Hou Tao <houtao@huaweicloud.com> wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> In __htab_map_lookup_and_delete_batch() if htab_lock_bucket() returns
>> -EBUSY, it will go to next bucket. Going to next bucket may not only
>> skip the elements in current bucket silently, but also incur
>> out-of-bound memory access or expose kernel memory to userspace if
>> current bucket_cnt is greater than bucket_size or zero.
>>
>> Fixing it by stopping batch operation and returning -EBUSY when
>> htab_lock_bucket() fails, and the application can retry or skip the busy
>> batch as needed.
>>
>> Reported-by: Hao Sun <sunhao.th@gmail.com>
>> Signed-off-by: Hou Tao <houtao1@huawei.com>
> Please add a Fixes tag here
Will add "Fixes: 20b6cc34ea74 ("bpf: Avoid hashtab deadlock with map_locked")"
in v3.
>
>> ---
>> kernel/bpf/hashtab.c | 7 +++++--
>> 1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>> index 6fb3b7fd1622..eb1263f03e9b 100644
>> --- a/kernel/bpf/hashtab.c
>> +++ b/kernel/bpf/hashtab.c
>> @@ -1704,8 +1704,11 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
>> /* do not grab the lock unless need it (bucket_cnt > 0). */
>> if (locked) {
>> ret = htab_lock_bucket(htab, b, batch, &flags);
>> - if (ret)
>> - goto next_batch;
>> + if (ret) {
>> + rcu_read_unlock();
>> + bpf_enable_instrumentation();
>> + goto after_loop;
>> + }
>> }
>>
>> bucket_cnt = 0;
>> --
>> 2.29.2
>>
> .
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-08-29 1:19 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-27 10:01 [PATCH bpf-next v2 1/3] bpf: Disable preemption when increasing per-cpu map_locked Hou Tao
2022-08-27 10:01 ` [PATCH bpf-next v2 2/3] bpf: Propagate error from htab_lock_bucket() to userspace Hou Tao
2022-08-28 0:24 ` KP Singh
2022-08-29 1:19 ` Hou Tao
2022-08-27 10:01 ` [PATCH bpf-next v2 3/3] selftests/bpf: add test cases for htab update Hou Tao
2022-08-28 22:39 ` KP Singh
2022-08-29 1:12 ` Hou Tao
2022-08-28 22:39 ` [PATCH bpf-next v2 1/3] bpf: Disable preemption when increasing per-cpu map_locked KP Singh
2022-08-29 1:16 ` Hou Tao
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.