All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/3] make bpf_task_storage_busy being preemption-safe
@ 2022-08-29 14:27 Hou Tao
  2022-08-29 14:27 ` [PATCH bpf-next 1/3] bpf: Use this_cpu_{inc|dec|inc_return} for bpf_task_storage_busy Hou Tao
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Hou Tao @ 2022-08-29 14:27 UTC (permalink / raw)
  To: bpf
  Cc: Song Liu, Sebastian Andrzej Siewior, Hao Sun, Hao Luo,
	Andrii Nakryiko, Yonghong Song, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, KP Singh, David S . Miller,
	Jakub Kicinski, Stanislav Fomichev, Jiri Olsa, John Fastabend,
	Lorenz Bauer, houtao1

From: Hou Tao <houtao1@huawei.com>

Hi,

The patchset aims to make the update of bpf_task_storage_busy being
preemption-safe. The problem is on same architectures (e.g. arm64),
__this_cpu_{inc|dec|inc_return} are neither preemption-safe nor
IRQ-safe, so the concurrent lookups or updates on the same task local
storage and the same CPU may make bpf_task_storage_busy be imbalanced,
and bpf_task_storage_trylock() on specific cpu will always fail.

Patch 1 fixes the problem by using preemption/IRQ-safe per-cpu helpers.
And patch 2 & patch 3 add a test case for the problem. Comments are
always welcome.

Regards,
Tao

Hou Tao (3):
  bpf: Use this_cpu_{inc|dec|inc_return} for bpf_task_storage_busy
  selftests/bpf: Move sys_pidfd_open() into test_progs.h
  selftests/bpf: Test concurrent updates on bpf_task_storage_busy

 kernel/bpf/bpf_local_storage.c                |  4 +-
 kernel/bpf/bpf_task_storage.c                 |  8 +-
 .../bpf/prog_tests/task_local_storage.c       | 91 +++++++++++++++++++
 .../selftests/bpf/prog_tests/test_bprm_opts.c |  9 --
 .../bpf/prog_tests/test_local_storage.c       |  9 --
 tools/testing/selftests/bpf/test_progs.h      |  9 ++
 6 files changed, 106 insertions(+), 24 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH bpf-next 1/3] bpf: Use this_cpu_{inc|dec|inc_return} for bpf_task_storage_busy
  2022-08-29 14:27 [PATCH bpf-next 0/3] make bpf_task_storage_busy being preemption-safe Hou Tao
@ 2022-08-29 14:27 ` Hou Tao
  2022-08-30  0:52   ` Martin KaFai Lau
  2022-08-29 14:27 ` [PATCH bpf-next 2/3] selftests/bpf: Move sys_pidfd_open() into test_progs.h Hou Tao
  2022-08-29 14:27 ` [PATCH bpf-next 3/3] selftests/bpf: Test concurrent updates on bpf_task_storage_busy Hou Tao
  2 siblings, 1 reply; 12+ messages in thread
From: Hou Tao @ 2022-08-29 14:27 UTC (permalink / raw)
  To: bpf
  Cc: Song Liu, Sebastian Andrzej Siewior, Hao Sun, Hao Luo,
	Andrii Nakryiko, Yonghong Song, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, KP Singh, David S . Miller,
	Jakub Kicinski, Stanislav Fomichev, Jiri Olsa, John Fastabend,
	Lorenz Bauer, houtao1

From: Hou Tao <houtao1@huawei.com>

Now migrate_disable() does not disable preemption and under some
architecture (e.g. arm64) __this_cpu_{inc|dec|inc_return} are neither
preemption-safe nor IRQ-safe, so the concurrent lookups or updates on
the same task local storage and the same CPU may make
bpf_task_storage_busy be imbalanced, and bpf_task_storage_trylock()
will always fail.

Fixing it by using this_cpu_{inc|dec|inc_return} when manipulating
bpf_task_storage_busy.

Fixes: bc235cdb423a ("bpf: Prevent deadlock from recursive bpf_task_storage_[get|delete]")
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 kernel/bpf/bpf_local_storage.c | 4 ++--
 kernel/bpf/bpf_task_storage.c  | 8 ++++----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index 4ee2e7286c23..802fc15b0d73 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -555,11 +555,11 @@ void bpf_local_storage_map_free(struct bpf_local_storage_map *smap,
 				struct bpf_local_storage_elem, map_node))) {
 			if (busy_counter) {
 				migrate_disable();
-				__this_cpu_inc(*busy_counter);
+				this_cpu_inc(*busy_counter);
 			}
 			bpf_selem_unlink(selem, false);
 			if (busy_counter) {
-				__this_cpu_dec(*busy_counter);
+				this_cpu_dec(*busy_counter);
 				migrate_enable();
 			}
 			cond_resched_rcu();
diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c
index e9014dc62682..6f290623347e 100644
--- a/kernel/bpf/bpf_task_storage.c
+++ b/kernel/bpf/bpf_task_storage.c
@@ -26,20 +26,20 @@ static DEFINE_PER_CPU(int, bpf_task_storage_busy);
 static void bpf_task_storage_lock(void)
 {
 	migrate_disable();
-	__this_cpu_inc(bpf_task_storage_busy);
+	this_cpu_inc(bpf_task_storage_busy);
 }
 
 static void bpf_task_storage_unlock(void)
 {
-	__this_cpu_dec(bpf_task_storage_busy);
+	this_cpu_dec(bpf_task_storage_busy);
 	migrate_enable();
 }
 
 static bool bpf_task_storage_trylock(void)
 {
 	migrate_disable();
-	if (unlikely(__this_cpu_inc_return(bpf_task_storage_busy) != 1)) {
-		__this_cpu_dec(bpf_task_storage_busy);
+	if (unlikely(this_cpu_inc_return(bpf_task_storage_busy) != 1)) {
+		this_cpu_dec(bpf_task_storage_busy);
 		migrate_enable();
 		return false;
 	}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH bpf-next 2/3] selftests/bpf: Move sys_pidfd_open() into test_progs.h
  2022-08-29 14:27 [PATCH bpf-next 0/3] make bpf_task_storage_busy being preemption-safe Hou Tao
  2022-08-29 14:27 ` [PATCH bpf-next 1/3] bpf: Use this_cpu_{inc|dec|inc_return} for bpf_task_storage_busy Hou Tao
@ 2022-08-29 14:27 ` Hou Tao
  2022-08-29 14:27 ` [PATCH bpf-next 3/3] selftests/bpf: Test concurrent updates on bpf_task_storage_busy Hou Tao
  2 siblings, 0 replies; 12+ messages in thread
From: Hou Tao @ 2022-08-29 14:27 UTC (permalink / raw)
  To: bpf
  Cc: Song Liu, Sebastian Andrzej Siewior, Hao Sun, Hao Luo,
	Andrii Nakryiko, Yonghong Song, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, KP Singh, David S . Miller,
	Jakub Kicinski, Stanislav Fomichev, Jiri Olsa, John Fastabend,
	Lorenz Bauer, houtao1

From: Hou Tao <houtao1@huawei.com>

sys_pidfd_open() is defined twice in both test_bprm_opts.c and
test_local_storage.c, so move it to test_progs.h. And it will be used
in task_local_storage.c as well.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 tools/testing/selftests/bpf/prog_tests/test_bprm_opts.c  | 9 ---------
 .../selftests/bpf/prog_tests/test_local_storage.c        | 9 ---------
 tools/testing/selftests/bpf/test_progs.h                 | 9 +++++++++
 3 files changed, 9 insertions(+), 18 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/test_bprm_opts.c b/tools/testing/selftests/bpf/prog_tests/test_bprm_opts.c
index 2559bb775762..17951999642a 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_bprm_opts.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_bprm_opts.c
@@ -10,17 +10,8 @@
 #include "bprm_opts.skel.h"
 #include "network_helpers.h"
 
-#ifndef __NR_pidfd_open
-#define __NR_pidfd_open 434
-#endif
-
 static const char * const bash_envp[] = { "TMPDIR=shouldnotbeset", NULL };
 
-static inline int sys_pidfd_open(pid_t pid, unsigned int flags)
-{
-	return syscall(__NR_pidfd_open, pid, flags);
-}
-
 static int update_storage(int map_fd, int secureexec)
 {
 	int task_fd, ret = 0;
diff --git a/tools/testing/selftests/bpf/prog_tests/test_local_storage.c b/tools/testing/selftests/bpf/prog_tests/test_local_storage.c
index 26ac26a88026..f8830c92c043 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_local_storage.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_local_storage.c
@@ -12,15 +12,6 @@
 #include "local_storage.skel.h"
 #include "network_helpers.h"
 
-#ifndef __NR_pidfd_open
-#define __NR_pidfd_open 434
-#endif
-
-static inline int sys_pidfd_open(pid_t pid, unsigned int flags)
-{
-	return syscall(__NR_pidfd_open, pid, flags);
-}
-
 static unsigned int duration;
 
 #define TEST_STORAGE_VALUE 0xbeefdead
diff --git a/tools/testing/selftests/bpf/test_progs.h b/tools/testing/selftests/bpf/test_progs.h
index 5fe1365c2bb1..6feb8595bbac 100644
--- a/tools/testing/selftests/bpf/test_progs.h
+++ b/tools/testing/selftests/bpf/test_progs.h
@@ -396,3 +396,12 @@ int trigger_module_test_write(int write_sz);
 #endif
 
 #define BPF_TESTMOD_TEST_FILE "/sys/kernel/bpf_testmod"
+
+#ifndef __NR_pidfd_open
+#define __NR_pidfd_open 434
+#endif
+
+static inline int sys_pidfd_open(pid_t pid, unsigned int flags)
+{
+	return syscall(__NR_pidfd_open, pid, flags);
+}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH bpf-next 3/3] selftests/bpf: Test concurrent updates on bpf_task_storage_busy
  2022-08-29 14:27 [PATCH bpf-next 0/3] make bpf_task_storage_busy being preemption-safe Hou Tao
  2022-08-29 14:27 ` [PATCH bpf-next 1/3] bpf: Use this_cpu_{inc|dec|inc_return} for bpf_task_storage_busy Hou Tao
  2022-08-29 14:27 ` [PATCH bpf-next 2/3] selftests/bpf: Move sys_pidfd_open() into test_progs.h Hou Tao
@ 2022-08-29 14:27 ` Hou Tao
  2022-08-30  1:13   ` Martin KaFai Lau
  2 siblings, 1 reply; 12+ messages in thread
From: Hou Tao @ 2022-08-29 14:27 UTC (permalink / raw)
  To: bpf
  Cc: Song Liu, Sebastian Andrzej Siewior, Hao Sun, Hao Luo,
	Andrii Nakryiko, Yonghong Song, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, KP Singh, David S . Miller,
	Jakub Kicinski, Stanislav Fomichev, Jiri Olsa, John Fastabend,
	Lorenz Bauer, houtao1

From: Hou Tao <houtao1@huawei.com>

When there are concurrent task local storage lookup operations,
if updates on per-cpu bpf_task_storage_busy is not preemption-safe,
some updates will be lost due to interleave, the final value of
bpf_task_storage_busy will not be zero and bpf_task_storage_trylock()
on specific cpu will fail forever.

So add a test case to ensure the update of per-cpu bpf_task_storage_busy
is preemption-safe.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 .../bpf/prog_tests/task_local_storage.c       | 91 +++++++++++++++++++
 1 file changed, 91 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_storage.c b/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
index 035c263aab1b..ab8ee42a19e4 100644
--- a/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
+++ b/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
@@ -3,6 +3,7 @@
 
 #define _GNU_SOURCE         /* See feature_test_macros(7) */
 #include <unistd.h>
+#include <sched.h>
 #include <sys/syscall.h>   /* For SYS_xxx definitions */
 #include <sys/types.h>
 #include <test_progs.h>
@@ -10,6 +11,14 @@
 #include "task_local_storage_exit_creds.skel.h"
 #include "task_ls_recursion.skel.h"
 
+struct lookup_ctx {
+	bool start;
+	bool stop;
+	int pid_fd;
+	int map_fd;
+	int loop;
+};
+
 static void test_sys_enter_exit(void)
 {
 	struct task_local_storage *skel;
@@ -81,6 +90,86 @@ static void test_recursion(void)
 	task_ls_recursion__destroy(skel);
 }
 
+static void *lookup_fn(void *arg)
+{
+	struct lookup_ctx *ctx = arg;
+	long value;
+	int i = 0;
+
+	while (!ctx->start)
+		usleep(1);
+
+	while (!ctx->stop && i++ < ctx->loop)
+		bpf_map_lookup_elem(ctx->map_fd, &ctx->pid_fd, &value);
+	return NULL;
+}
+
+static void test_preemption(void)
+{
+	struct task_local_storage *skel;
+	struct lookup_ctx ctx;
+	unsigned int i, nr;
+	cpu_set_t old, new;
+	pthread_t *tids;
+	int err;
+
+	skel = task_local_storage__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open_and_load"))
+		return;
+
+	/* Save the old affinity setting */
+	sched_getaffinity(getpid(), sizeof(old), &old);
+
+	/* Pinned on CPU 0 */
+	CPU_ZERO(&new);
+	CPU_SET(0, &new);
+	sched_setaffinity(getpid(), sizeof(new), &new);
+
+	nr = 256;
+	tids = calloc(nr, sizeof(*tids));
+	if (!ASSERT_NEQ(tids, NULL, "no mem"))
+		goto out;
+
+	ctx.start = false;
+	ctx.stop = false;
+	ctx.pid_fd = sys_pidfd_open(getpid(), 0);
+	ctx.map_fd = bpf_map__fd(skel->maps.enter_id);
+	ctx.loop = 8192;
+	for (i = 0; i < nr; i++) {
+		err = pthread_create(&tids[i], NULL, lookup_fn, &ctx);
+		if (err) {
+			unsigned int j;
+
+			ASSERT_OK(err, "pthread_create");
+
+			ctx.stop = true;
+			ctx.start = true;
+			for (j = 0; j < i; j++)
+				pthread_join(tids[j], NULL);
+			goto out;
+		}
+	}
+
+	ctx.start = true;
+	for (i = 0; i < nr; i++)
+		pthread_join(tids[i], NULL);
+
+	err = task_local_storage__attach(skel);
+	if (!ASSERT_OK(err, "skel_attach"))
+		goto out;
+
+	skel->bss->target_pid = syscall(SYS_gettid);
+	syscall(SYS_gettid);
+
+	/* If bpf_task_storage_trylock() fails, enter_cnt will be 0 */
+	ASSERT_EQ(skel->bss->enter_cnt, 1, "enter_cnt");
+out:
+	free(tids);
+	task_local_storage__destroy(skel);
+	/* Restore affinity setting */
+	sched_setaffinity(getpid(), sizeof(old), &old);
+}
+
 void test_task_local_storage(void)
 {
 	if (test__start_subtest("sys_enter_exit"))
@@ -89,4 +178,6 @@ void test_task_local_storage(void)
 		test_exit_creds();
 	if (test__start_subtest("recursion"))
 		test_recursion();
+	if (test__start_subtest("preemption"))
+		test_preemption();
 }
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Use this_cpu_{inc|dec|inc_return} for bpf_task_storage_busy
  2022-08-29 14:27 ` [PATCH bpf-next 1/3] bpf: Use this_cpu_{inc|dec|inc_return} for bpf_task_storage_busy Hou Tao
@ 2022-08-30  0:52   ` Martin KaFai Lau
  2022-08-30  2:21     ` Hou Tao
  0 siblings, 1 reply; 12+ messages in thread
From: Martin KaFai Lau @ 2022-08-30  0:52 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, Song Liu, Sebastian Andrzej Siewior, Hao Sun, Hao Luo,
	Andrii Nakryiko, Yonghong Song, Alexei Starovoitov,
	Daniel Borkmann, KP Singh, David S . Miller, Jakub Kicinski,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, Lorenz Bauer,
	houtao1

On Mon, Aug 29, 2022 at 10:27:50PM +0800, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> Now migrate_disable() does not disable preemption and under some
> architecture (e.g. arm64) __this_cpu_{inc|dec|inc_return} are neither
> preemption-safe nor IRQ-safe, so the concurrent lookups or updates on
> the same task local storage and the same CPU may make
> bpf_task_storage_busy be imbalanced, and bpf_task_storage_trylock()
> will always fail.
> 
> Fixing it by using this_cpu_{inc|dec|inc_return} when manipulating
> bpf_task_storage_busy.
> 
> Fixes: bc235cdb423a ("bpf: Prevent deadlock from recursive bpf_task_storage_[get|delete]")
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
>  kernel/bpf/bpf_local_storage.c | 4 ++--
>  kernel/bpf/bpf_task_storage.c  | 8 ++++----
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
> index 4ee2e7286c23..802fc15b0d73 100644
> --- a/kernel/bpf/bpf_local_storage.c
> +++ b/kernel/bpf/bpf_local_storage.c
> @@ -555,11 +555,11 @@ void bpf_local_storage_map_free(struct bpf_local_storage_map *smap,
>  				struct bpf_local_storage_elem, map_node))) {
>  			if (busy_counter) {
>  				migrate_disable();
> -				__this_cpu_inc(*busy_counter);
> +				this_cpu_inc(*busy_counter);
>  			}
>  			bpf_selem_unlink(selem, false);
>  			if (busy_counter) {
> -				__this_cpu_dec(*busy_counter);
> +				this_cpu_dec(*busy_counter);
>  				migrate_enable();
>  			}
>  			cond_resched_rcu();
> diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c
> index e9014dc62682..6f290623347e 100644
> --- a/kernel/bpf/bpf_task_storage.c
> +++ b/kernel/bpf/bpf_task_storage.c
> @@ -26,20 +26,20 @@ static DEFINE_PER_CPU(int, bpf_task_storage_busy);
>  static void bpf_task_storage_lock(void)
>  {
>  	migrate_disable();
> -	__this_cpu_inc(bpf_task_storage_busy);
> +	this_cpu_inc(bpf_task_storage_busy);
>  }
>  
>  static void bpf_task_storage_unlock(void)
>  {
> -	__this_cpu_dec(bpf_task_storage_busy);
> +	this_cpu_dec(bpf_task_storage_busy);
>  	migrate_enable();
>  }
>  
>  static bool bpf_task_storage_trylock(void)
>  {
>  	migrate_disable();
> -	if (unlikely(__this_cpu_inc_return(bpf_task_storage_busy) != 1)) {
> -		__this_cpu_dec(bpf_task_storage_busy);
> +	if (unlikely(this_cpu_inc_return(bpf_task_storage_busy) != 1)) {
> +		this_cpu_dec(bpf_task_storage_busy);
This change is only needed here but not in the htab fix [0]
or you are planning to address it separately ?

[0]: https://lore.kernel.org/bpf/20220829023709.1958204-2-houtao@huaweicloud.com/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next 3/3] selftests/bpf: Test concurrent updates on bpf_task_storage_busy
  2022-08-29 14:27 ` [PATCH bpf-next 3/3] selftests/bpf: Test concurrent updates on bpf_task_storage_busy Hou Tao
@ 2022-08-30  1:13   ` Martin KaFai Lau
  2022-08-30  2:37     ` Hou Tao
  0 siblings, 1 reply; 12+ messages in thread
From: Martin KaFai Lau @ 2022-08-30  1:13 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, Song Liu, Sebastian Andrzej Siewior, Hao Sun, Hao Luo,
	Andrii Nakryiko, Yonghong Song, Alexei Starovoitov,
	Daniel Borkmann, KP Singh, David S . Miller, Jakub Kicinski,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, Lorenz Bauer,
	houtao1

On Mon, Aug 29, 2022 at 10:27:52PM +0800, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> When there are concurrent task local storage lookup operations,
> if updates on per-cpu bpf_task_storage_busy is not preemption-safe,
> some updates will be lost due to interleave, the final value of
> bpf_task_storage_busy will not be zero and bpf_task_storage_trylock()
> on specific cpu will fail forever.
> 
> So add a test case to ensure the update of per-cpu bpf_task_storage_busy
> is preemption-safe.
This test took my setup 1.5 minute to run
and cannot reproduce after running the test in a loop.

Can it be reproduced in a much shorter time ?
If not, test_maps is probably a better place to do the test.

I assume it can be reproduced in arm with this test?  Or it can
also be reproduced in other platforms with different kconfig.
Please paste the test failure message and the platform/kconfig
to reproduce it in the commit message.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Use this_cpu_{inc|dec|inc_return} for bpf_task_storage_busy
  2022-08-30  0:52   ` Martin KaFai Lau
@ 2022-08-30  2:21     ` Hou Tao
  2022-08-31  0:41       ` Martin KaFai Lau
  0 siblings, 1 reply; 12+ messages in thread
From: Hou Tao @ 2022-08-30  2:21 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Song Liu, Sebastian Andrzej Siewior, Hao Sun, Hao Luo,
	Andrii Nakryiko, Yonghong Song, Alexei Starovoitov,
	Daniel Borkmann, KP Singh, David S . Miller, Jakub Kicinski,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, Lorenz Bauer,
	houtao1

Hi,

On 8/30/2022 8:52 AM, Martin KaFai Lau wrote:
> On Mon, Aug 29, 2022 at 10:27:50PM +0800, Hou Tao wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> Now migrate_disable() does not disable preemption and under some
>> architecture (e.g. arm64) __this_cpu_{inc|dec|inc_return} are neither
>> preemption-safe nor IRQ-safe, so the concurrent lookups or updates on
>> the same task local storage and the same CPU may make
>> bpf_task_storage_busy be imbalanced, and bpf_task_storage_trylock()
>> will always fail.
>>
>> Fixing it by using this_cpu_{inc|dec|inc_return} when manipulating
>> bpf_task_storage_busy.
SNIP
>>  static bool bpf_task_storage_trylock(void)
>>  {
>>  	migrate_disable();
>> -	if (unlikely(__this_cpu_inc_return(bpf_task_storage_busy) != 1)) {
>> -		__this_cpu_dec(bpf_task_storage_busy);
>> +	if (unlikely(this_cpu_inc_return(bpf_task_storage_busy) != 1)) {
>> +		this_cpu_dec(bpf_task_storage_busy);
> This change is only needed here but not in the htab fix [0]
> or you are planning to address it separately ?
>
> [0]: https://lore.kernel.org/bpf/20220829023709.1958204-2-houtao@huaweicloud.com/
> .
For htab_lock_bucket() in hash-table, in theory there will be problem as shown
below, but I can not reproduce it on ARM64 host:

*p = 0

// process A
r0 = *p
r0 += 1
            // process B
            r1 = *p
// *p = 1
*p = r0
            r1 += 1
            // *p = 1
            *p = r1

// r0 = 1
r0 = *p
            // r1 = 1
            r1 = *p

In hash table fixes, migrate_disable() in htab_lock_bucket()  is replaced by
preempt_disable(), so the above case will be impossible. And if process A is
preempted by IRQ, __this_cpu_inc_return will be OK.

Beside bpf hash-table, there are also other __this_cpu_inc_return users in bpf
(e.g. __bpf_prog_enter), maybe I should fix these usage as well ?



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next 3/3] selftests/bpf: Test concurrent updates on bpf_task_storage_busy
  2022-08-30  1:13   ` Martin KaFai Lau
@ 2022-08-30  2:37     ` Hou Tao
  2022-08-31  0:47       ` Martin KaFai Lau
  0 siblings, 1 reply; 12+ messages in thread
From: Hou Tao @ 2022-08-30  2:37 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Song Liu, Sebastian Andrzej Siewior, Hao Sun, Hao Luo,
	Andrii Nakryiko, Yonghong Song, Alexei Starovoitov,
	Daniel Borkmann, KP Singh, David S . Miller, Jakub Kicinski,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, Lorenz Bauer,
	houtao1

Hi,

On 8/30/2022 9:13 AM, Martin KaFai Lau wrote:
> On Mon, Aug 29, 2022 at 10:27:52PM +0800, Hou Tao wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> When there are concurrent task local storage lookup operations,
>> if updates on per-cpu bpf_task_storage_busy is not preemption-safe,
>> some updates will be lost due to interleave, the final value of
>> bpf_task_storage_busy will not be zero and bpf_task_storage_trylock()
>> on specific cpu will fail forever.
>>
>> So add a test case to ensure the update of per-cpu bpf_task_storage_busy
>> is preemption-safe.
> This test took my setup 1.5 minute to run
> and cannot reproduce after running the test in a loop.
>
> Can it be reproduced in a much shorter time ?
> If not, test_maps is probably a better place to do the test.
I think the answer is No. I have think about adding the test in test_maps, but
the test case needs running a bpf program to check whether the value of
bpf_task_storage_busy is OK, so for simplicity I add it in test_progs.
If the running time is the problem, I can move it into test_maps.
> I assume it can be reproduced in arm with this test?  Or it can
> also be reproduced in other platforms with different kconfig.
> Please paste the test failure message and the platform/kconfig
> to reproduce it in the commit message.
On arm64 it can be reproduced probabilistically when CONFIG_PREEMPT is enabled
on 2-cpus VM as show below. You can try to increase the value of nr and loop if
it still can not be reproduced.

test_preemption:PASS:skel_open_and_load 0 nsec
test_preemption:PASS:no mem 0 nsec
test_preemption:PASS:skel_attach 0 nsec
test_preemption:FAIL:bpf_task_storage_get fails unexpected bpf_task_storage_get
fails: actual 0 != expected 1
#174/4   task_local_storage/preemption:FAIL
#174     task_local_storage:FAIL

All error logs:
test_preemption:PASS:skel_open_and_load 0 nsec
test_preemption:PASS:no mem 0 nsec
test_preemption:PASS:skel_attach 0 nsec
test_preemption:FAIL:bpf_task_storage_get fails unexpected bpf_task_storage_get
fails: actual 0 != expected 1
#174/4   task_local_storage/preemption:FAIL
#174     task_local_storage:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED

On x86-64 __this_cpu_{inc|dec} are atomic, so it is not possible to reproduce
the problem.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Use this_cpu_{inc|dec|inc_return} for bpf_task_storage_busy
  2022-08-30  2:21     ` Hou Tao
@ 2022-08-31  0:41       ` Martin KaFai Lau
  2022-08-31  2:04         ` Hou Tao
  0 siblings, 1 reply; 12+ messages in thread
From: Martin KaFai Lau @ 2022-08-31  0:41 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, Song Liu, Sebastian Andrzej Siewior, Hao Sun, Hao Luo,
	Andrii Nakryiko, Yonghong Song, Alexei Starovoitov,
	Daniel Borkmann, KP Singh, David S . Miller, Jakub Kicinski,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, Lorenz Bauer,
	houtao1

On Tue, Aug 30, 2022 at 10:21:30AM +0800, Hou Tao wrote:
> Hi,
> 
> On 8/30/2022 8:52 AM, Martin KaFai Lau wrote:
> > On Mon, Aug 29, 2022 at 10:27:50PM +0800, Hou Tao wrote:
> >> From: Hou Tao <houtao1@huawei.com>
> >>
> >> Now migrate_disable() does not disable preemption and under some
> >> architecture (e.g. arm64) __this_cpu_{inc|dec|inc_return} are neither
> >> preemption-safe nor IRQ-safe, so the concurrent lookups or updates on
> >> the same task local storage and the same CPU may make
> >> bpf_task_storage_busy be imbalanced, and bpf_task_storage_trylock()
> >> will always fail.
> >>
> >> Fixing it by using this_cpu_{inc|dec|inc_return} when manipulating
> >> bpf_task_storage_busy.
> SNIP
> >>  static bool bpf_task_storage_trylock(void)
> >>  {
> >>  	migrate_disable();
> >> -	if (unlikely(__this_cpu_inc_return(bpf_task_storage_busy) != 1)) {
> >> -		__this_cpu_dec(bpf_task_storage_busy);
> >> +	if (unlikely(this_cpu_inc_return(bpf_task_storage_busy) != 1)) {
> >> +		this_cpu_dec(bpf_task_storage_busy);
> > This change is only needed here but not in the htab fix [0]
> > or you are planning to address it separately ?
> >
> > [0]: https://lore.kernel.org/bpf/20220829023709.1958204-2-houtao@huaweicloud.com/
> > .
> For htab_lock_bucket() in hash-table, in theory there will be problem as shown
> below, but I can not reproduce it on ARM64 host:
> 
> *p = 0
> 
> // process A
> r0 = *p
> r0 += 1
>             // process B
>             r1 = *p
> // *p = 1
> *p = r0
>             r1 += 1
>             // *p = 1
>             *p = r1
> 
> // r0 = 1
> r0 = *p
>             // r1 = 1
>             r1 = *p
> 
> In hash table fixes, migrate_disable() in htab_lock_bucket()  is replaced by
> preempt_disable(), so the above case will be impossible. And if process A is
> preempted by IRQ, __this_cpu_inc_return will be OK.
hmm... iiuc, it is fine for the preempted by IRQ case because the
operations won't be interleaved as the process A/B example above?
That should also be true for the task-storage case here,
meaning only CONFIG_PREEMPT can reproduce it?  If that is the
case, please also mention that in the commit message.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next 3/3] selftests/bpf: Test concurrent updates on bpf_task_storage_busy
  2022-08-30  2:37     ` Hou Tao
@ 2022-08-31  0:47       ` Martin KaFai Lau
  2022-08-31  2:07         ` Hou Tao
  0 siblings, 1 reply; 12+ messages in thread
From: Martin KaFai Lau @ 2022-08-31  0:47 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, Song Liu, Sebastian Andrzej Siewior, Hao Sun, Hao Luo,
	Andrii Nakryiko, Yonghong Song, Alexei Starovoitov,
	Daniel Borkmann, KP Singh, David S . Miller, Jakub Kicinski,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, Lorenz Bauer,
	houtao1

On Tue, Aug 30, 2022 at 10:37:17AM +0800, Hou Tao wrote:
> Hi,
> 
> On 8/30/2022 9:13 AM, Martin KaFai Lau wrote:
> > On Mon, Aug 29, 2022 at 10:27:52PM +0800, Hou Tao wrote:
> >> From: Hou Tao <houtao1@huawei.com>
> >>
> >> When there are concurrent task local storage lookup operations,
> >> if updates on per-cpu bpf_task_storage_busy is not preemption-safe,
> >> some updates will be lost due to interleave, the final value of
> >> bpf_task_storage_busy will not be zero and bpf_task_storage_trylock()
> >> on specific cpu will fail forever.
> >>
> >> So add a test case to ensure the update of per-cpu bpf_task_storage_busy
> >> is preemption-safe.
> > This test took my setup 1.5 minute to run
> > and cannot reproduce after running the test in a loop.
> >
> > Can it be reproduced in a much shorter time ?
> > If not, test_maps is probably a better place to do the test.
> I think the answer is No. I have think about adding the test in test_maps, but
> the test case needs running a bpf program to check whether the value of
> bpf_task_storage_busy is OK, so for simplicity I add it in test_progs.
> If the running time is the problem, I can move it into test_maps.
> > I assume it can be reproduced in arm with this test?  Or it can
> > also be reproduced in other platforms with different kconfig.
> > Please paste the test failure message and the platform/kconfig
> > to reproduce it in the commit message.
> On arm64 it can be reproduced probabilistically when CONFIG_PREEMPT is enabled
> on 2-cpus VM as show below. You can try to increase the value of nr and loop if
> it still can not be reproduced.
I don't have arm64 environment now, so cannot try it out.
Please move the test to test_maps.
If it is CONFIG_PREEMPT only, you can also check if CONFIG_PREEMPT
is set or not and skip the test.  Take a look at __kconfig usage
under bpf/progs.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Use this_cpu_{inc|dec|inc_return} for bpf_task_storage_busy
  2022-08-31  0:41       ` Martin KaFai Lau
@ 2022-08-31  2:04         ` Hou Tao
  0 siblings, 0 replies; 12+ messages in thread
From: Hou Tao @ 2022-08-31  2:04 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Song Liu, Sebastian Andrzej Siewior, Hao Sun, Hao Luo,
	Andrii Nakryiko, Yonghong Song, Alexei Starovoitov,
	Daniel Borkmann, KP Singh, David S . Miller, Jakub Kicinski,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, Lorenz Bauer,
	houtao1

Hi,

On 8/31/2022 8:41 AM, Martin KaFai Lau wrote:
> On Tue, Aug 30, 2022 at 10:21:30AM +0800, Hou Tao wrote:
>> Hi,
>>
>> On 8/30/2022 8:52 AM, Martin KaFai Lau wrote:
>>> On Mon, Aug 29, 2022 at 10:27:50PM +0800, Hou Tao wrote:
>>>> From: Hou Tao <houtao1@huawei.com>
>>>>
>>>> Now migrate_disable() does not disable preemption and under some
>>>> architecture (e.g. arm64) __this_cpu_{inc|dec|inc_return} are neither
>>>> preemption-safe nor IRQ-safe, so the concurrent lookups or updates on
>>>> the same task local storage and the same CPU may make
>>>> bpf_task_storage_busy be imbalanced, and bpf_task_storage_trylock()
>>>> will always fail.
>>>>
>>>> Fixing it by using this_cpu_{inc|dec|inc_return} when manipulating
>>>> bpf_task_storage_busy.
>> SNIP
>>>>  static bool bpf_task_storage_trylock(void)
>>>>  {
>>>>  	migrate_disable();
>>>> -	if (unlikely(__this_cpu_inc_return(bpf_task_storage_busy) != 1)) {
>>>> -		__this_cpu_dec(bpf_task_storage_busy);
>>>> +	if (unlikely(this_cpu_inc_return(bpf_task_storage_busy) != 1)) {
>>>> +		this_cpu_dec(bpf_task_storage_busy);
>>> This change is only needed here but not in the htab fix [0]
>>> or you are planning to address it separately ?
>>>
>>> [0]: https://lore.kernel.org/bpf/20220829023709.1958204-2-houtao@huaweicloud.com/
>>> .
>> For htab_lock_bucket() in hash-table, in theory there will be problem as shown
>> below, but I can not reproduce it on ARM64 host:
>>
>> *p = 0
>>
>> // process A
>> r0 = *p
>> r0 += 1
>>             // process B
>>             r1 = *p
>> // *p = 1
>> *p = r0
>>             r1 += 1
>>             // *p = 1
>>             *p = r1
>>
>> // r0 = 1
>> r0 = *p
>>             // r1 = 1
>>             r1 = *p
>>
>> In hash table fixes, migrate_disable() in htab_lock_bucket()  is replaced by
>> preempt_disable(), so the above case will be impossible. And if process A is
>> preempted by IRQ, __this_cpu_inc_return will be OK.
> hmm... iiuc, it is fine for the preempted by IRQ case because the
> operations won't be interleaved as the process A/B example above?
> That should also be true for the task-storage case here,
> meaning only CONFIG_PREEMPT can reproduce it?  If that is the
> case, please also mention that in the commit message.
> .
Yes. There will be no interleave if it is preempted by IRQ and it is also true
for task-storage case. CONFIG_PREEMPT can reproduce and in theory
CONFIG_PREEMPT_RT is also possible. Will add that in commit message.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next 3/3] selftests/bpf: Test concurrent updates on bpf_task_storage_busy
  2022-08-31  0:47       ` Martin KaFai Lau
@ 2022-08-31  2:07         ` Hou Tao
  0 siblings, 0 replies; 12+ messages in thread
From: Hou Tao @ 2022-08-31  2:07 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Song Liu, Sebastian Andrzej Siewior, Hao Sun, Hao Luo,
	Andrii Nakryiko, Yonghong Song, Alexei Starovoitov,
	Daniel Borkmann, KP Singh, David S . Miller, Jakub Kicinski,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, Lorenz Bauer,
	houtao1

Hi,

On 8/31/2022 8:47 AM, Martin KaFai Lau wrote:
> On Tue, Aug 30, 2022 at 10:37:17AM +0800, Hou Tao wrote:
>> Hi,
>>
>> On 8/30/2022 9:13 AM, Martin KaFai Lau wrote:
>>> On Mon, Aug 29, 2022 at 10:27:52PM +0800, Hou Tao wrote:
>>>> From: Hou Tao <houtao1@huawei.com>
>>>>
>>>> When there are concurrent task local storage lookup operations,
>>>> if updates on per-cpu bpf_task_storage_busy is not preemption-safe,
>>>> some updates will be lost due to interleave, the final value of
>>>> bpf_task_storage_busy will not be zero and bpf_task_storage_trylock()
>>>> on specific cpu will fail forever.
>>>>
>>>> So add a test case to ensure the update of per-cpu bpf_task_storage_busy
>>>> is preemption-safe.
>>> This test took my setup 1.5 minute to run
>>> and cannot reproduce after running the test in a loop.
>>>
>>> Can it be reproduced in a much shorter time ?
>>> If not, test_maps is probably a better place to do the test.
>> I think the answer is No. I have think about adding the test in test_maps, but
>> the test case needs running a bpf program to check whether the value of
>> bpf_task_storage_busy is OK, so for simplicity I add it in test_progs.
>> If the running time is the problem, I can move it into test_maps.
>>> I assume it can be reproduced in arm with this test?  Or it can
>>> also be reproduced in other platforms with different kconfig.
>>> Please paste the test failure message and the platform/kconfig
>>> to reproduce it in the commit message.
>> On arm64 it can be reproduced probabilistically when CONFIG_PREEMPT is enabled
>> on 2-cpus VM as show below. You can try to increase the value of nr and loop if
>> it still can not be reproduced.
> I don't have arm64 environment now, so cannot try it out.
For arm raw_cpu_add_4 is also not preemption safe, so I think it is also possible.
> Please move the test to test_maps.
> If it is CONFIG_PREEMPT only, you can also check if CONFIG_PREEMPT
> is set or not and skip the test.  Take a look at __kconfig usage
> under bpf/progs.
Will do.
> .


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-08-31  2:07 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-29 14:27 [PATCH bpf-next 0/3] make bpf_task_storage_busy being preemption-safe Hou Tao
2022-08-29 14:27 ` [PATCH bpf-next 1/3] bpf: Use this_cpu_{inc|dec|inc_return} for bpf_task_storage_busy Hou Tao
2022-08-30  0:52   ` Martin KaFai Lau
2022-08-30  2:21     ` Hou Tao
2022-08-31  0:41       ` Martin KaFai Lau
2022-08-31  2:04         ` Hou Tao
2022-08-29 14:27 ` [PATCH bpf-next 2/3] selftests/bpf: Move sys_pidfd_open() into test_progs.h Hou Tao
2022-08-29 14:27 ` [PATCH bpf-next 3/3] selftests/bpf: Test concurrent updates on bpf_task_storage_busy Hou Tao
2022-08-30  1:13   ` Martin KaFai Lau
2022-08-30  2:37     ` Hou Tao
2022-08-31  0:47       ` Martin KaFai Lau
2022-08-31  2:07         ` Hou Tao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.